Abstract:
Objective To address the challenges of systematically identifying herb pairs in traditional Chinese medicine (TCM), we proposed HerbGL, a framework for predicting potential herb pairs that integrates network propagation and graph regularization.
Methods Based on the assumption that herbal actions induce subtle perturbations in biological systems, a framework named HerbGL was proposed. Random walk with restart (RWR) was first applied to the protein-protein interaction (PPI) network to reconstruct herb-specific perturbation effects and generate weighted subnetworks. Then, to quantify affinity between herb pairs, two network-proximity metrics, Closeness and PageRank, were computed from the weighted subnetworks to construct herb-pair affinity matrices. Finally, these matrices, together with known herb pairs (derived from co-occurrence analysis of TCM formulas with a threshold determined from the inflection point of the frequency distribution), were incorporated into a graph regularization model to predict potential herb pairs. Model performance was assessed through baseline comparison, ablation and robustness experiment under different ratios of positive and negative samples, using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), accuracy, and precision as evaluation metrics. Furthermore, the predicted herb pairs were validated through both literature evidence and Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses.
Results The weighted subnetworks constructed by RWR provided a refined simulation of herb-specific perturbation effects, which formed the basis for subsequent affinity modeling and prediction. Analysis of herb pair co-occurrence frequencies revealed a marked change around 150, which was selected as the threshold to distinguish herb pairs from non-herb pairs. HerbGL exhibited superior predictive performance compared with baseline models (AUROC = 0.970 5, AUPRC = 0.955 5, accuracy = 0.726 6, precision = 0.970 6). Ablation results showed that removing the Closeness and PageRank metrics substantially degraded performance (AUROC = 0.819 1, AUPRC = 0.876 8), confirming their necessity. Robustness evaluation under an imbalanced positive-to-negative sample ratio of 1 : 5 yielded AUROC = 0.969 6 and AUPRC = 0.840 4, indicating stable predictive ability. Moreover, multiple case studies further validated the rationality of the predicted herb pairs, such as Fangfeng (Saposhnikoviae Radix) and Qingpi (Citri Reticulatae Pericarpium Viride) which are recorded in Liangpeng Huiji (《良朋汇集》, Collection of Excellent Recipes) Vol. 3: Fangfeng Shengma Tang (防风升麻汤). Additionally, pathway enrichment analysis of the Renshen (Ginseng Radix et Rhizoma) and Lianqiao (Forsythiae Fructus) pair further supported the biological plausibility of their compatibility.
Conclusion HerbGL offers an effective and biologically informed framework for identifying herb pairs in TCM. Beyond improving herb pair prediction, the framework also provides data support for research on herb compounds and mechanisms, thereby supporting data-driven exploration of TCM compatibility.