HerbGL:一种基于网络传播与图正则化的中药药对预测框架

HerbGL: a network propagation and graph regularization-based framework for herb pairs prediction in traditional Chinese medicine

  • 摘要:
    目的 为解决传统中医药系统性方法识别配伍药对的问题,本文提出一种融合网络传播与图正则化的潜在药对预测框架HerbGL。
    方法 基于“中药的药理作用会在生物系统中引发细微扰动”的假设,本文提出HerbGL框架。首先,在蛋白质互作(PPI)网络上应用重启随机游走(RWR)算法,以重构各中药多靶点蛋白的特异性扰动并生成中药加权子网络。随后,为量化药对间的亲和度,计算每个中药加权子网络上的两种网络邻近度指标(Closeness和PageRank),从而构建药对亲和矩阵。最后,将其与已知药对信息(基于中药方剂中药对共现频次分析,并通过频率分布拐点确定阈值筛选获得)一起整合入图正则化模型,实现潜在药对的预测。同时,通过基线模型对比、消融实验及不同正负样本比例下的鲁棒性实验对模型性能进行系统评估,评价指标包括受试者工作特征曲线下面积(AUROC)、精确度-召回率曲线下面积(AUPRC)、准确率和精确率。
    结果 RWR构建的加权子网络能够更细致地模拟中药特异性扰动效应,为后续的亲和度建模与预测提供了基础表示。药对共现频次的分析结果,显示在频次为150处,药对数量发生显著变化,因此该值被选作区分药对与非药对的阈值。此外,HerbGL的预测性能优于所有对比的基线模型(AUROC = 0.970 5,AUPRC = 0.955 5,准确率 = 0.726 6,精确率 = 0.970 6)。消融实验结果表明,去除Closeness与PageRank指标会显著降低模型性能(AUROC = 0.819 1,AUPRC = 0.876 8),这验证了两类网络特征的必要性。鲁棒性实验表明,即使在正负样本比例为1 : 5的情况下,模型仍保持稳定表现(AUROC = 0.969 6,AUPRC = 0.840 4)。此外,多项现有案例亦验证了模型所识别药对的合理性,例如防风与青皮在《良朋汇集》(第三卷:防风升麻汤)中就有合用于治疗牙疼的记载。通路富集分析进一步验证了药对人参和连翘的生物学配伍合理性。
    结论 HerbGL提供了一种有效且具有生物学依据的中药药对识别框架,不仅能提升药对预测能力,还能为中药成分作用及机制研究提供数据支持,从而助力基于数据的中药配伍规律探索。

     

    Abstract:
    Objective To address the challenges of systematically identifying herb pairs in traditional Chinese medicine (TCM), we proposed HerbGL, a framework for predicting potential herb pairs that integrates network propagation and graph regularization.
    Methods Based on the assumption that herbal actions induce subtle perturbations in biological systems, a framework named HerbGL was proposed. Random walk with restart (RWR) was first applied to the protein-protein interaction (PPI) network to reconstruct herb-specific perturbation effects and generate weighted subnetworks. Then, to quantify affinity between herb pairs, two network-proximity metrics, Closeness and PageRank, were computed from the weighted subnetworks to construct herb-pair affinity matrices. Finally, these matrices, together with known herb pairs (derived from co-occurrence analysis of TCM formulas with a threshold determined from the inflection point of the frequency distribution), were incorporated into a graph regularization model to predict potential herb pairs. Model performance was assessed through baseline comparison, ablation and robustness experiment under different ratios of positive and negative samples, using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), accuracy, and precision as evaluation metrics. Furthermore, the predicted herb pairs were validated through both literature evidence and Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses.
    Results The weighted subnetworks constructed by RWR provided a refined simulation of herb-specific perturbation effects, which formed the basis for subsequent affinity modeling and prediction. Analysis of herb pair co-occurrence frequencies revealed a marked change around 150, which was selected as the threshold to distinguish herb pairs from non-herb pairs. HerbGL exhibited superior predictive performance compared with baseline models (AUROC = 0.970 5, AUPRC = 0.955 5, accuracy = 0.726 6, precision = 0.970 6). Ablation results showed that removing the Closeness and PageRank metrics substantially degraded performance (AUROC = 0.819 1, AUPRC = 0.876 8), confirming their necessity. Robustness evaluation under an imbalanced positive-to-negative sample ratio of 1 : 5 yielded AUROC = 0.969 6 and AUPRC = 0.840 4, indicating stable predictive ability. Moreover, multiple case studies further validated the rationality of the predicted herb pairs, such as Fangfeng (Saposhnikoviae Radix) and Qingpi (Citri Reticulatae Pericarpium Viride) which are recorded in Liangpeng Huiji (《良朋汇集》, Collection of Excellent Recipes) Vol. 3: Fangfeng Shengma Tang (防风升麻汤). Additionally, pathway enrichment analysis of the Renshen (Ginseng Radix et Rhizoma) and Lianqiao (Forsythiae Fructus) pair further supported the biological plausibility of their compatibility.
    Conclusion HerbGL offers an effective and biologically informed framework for identifying herb pairs in TCM. Beyond improving herb pair prediction, the framework also provides data support for research on herb compounds and mechanisms, thereby supporting data-driven exploration of TCM compatibility.

     

/

返回文章
返回