基于深度学习的中医阳虚舌象亚型识别

张桐彬; 徐浩然; 王紫益; 潘传军; 汪正; 王雷

doi:10.1016/j.dcmed.2026.05.006

基于深度学习的中医阳虚舌象亚型识别

Deep learning for subtype recognition of Yang deficiency tongue images in traditional Chinese medicine

摘要

摘要:
目的针对中医阳虚证亚型临床识别颗粒度不足以及传统目标检测模型在提取不规则、低对比度舌象表型方面的局限性，本研究旨在构建一种基于改进YOLO11n架构的客观识别框架，通过标准化视觉表型矩阵将宏观中医描述转化为可量化的临床目标。
方法本横断面诊断研究连续纳入了2024年9月1日至2025年6月1日期间，于皖南医科大学第一附属医院（弋矶山医院）甲乳外科住院、经初步中医会诊为阳虚体质的成年患者，并收集其临床舌象图像数据用于分析。基于专家视觉表型标注矩阵，构建了包含以下中医证候亚型的五分类识别体系：脾虚湿盛证、轻度肾阳虚证、上热下寒证、阴阳两虚证，以及作为阴性对照的阴虚津亏证。所提出的阳虚识别YOLO模型（YD-YOLO）以YOLO11n为基线，在主干网络中集成跨阶段部分卷积与核大小2（C3k2）-幽灵瓶颈-动态卷积（GBDC）模块以自适应提取低对比度特征，并在颈部网络嵌入多路径聚合坐标注意力（MACA）机制，通过多尺度空间协调抑制背景干扰。采用梯度加权类激活映射（Grad-CAM）实现特征归因可视化，并评估模型关注区域的生物学合理性。采用消融实验和对比实验评估模型性能，评价指标包括平均精度均值（mAP）、精确率、召回率、F1 值、推理速度（FPS）、总体准确率、Cohen’s kappa 系数及受试者工作特征（ROC）曲线下面积（AUC）。
结果研究最终纳入1186例临床病例，YD-YOLO模型总体准确率为91.5%，Cohen’s kappa系数为0.912，mAP@50达0.731（优于基线YOLO11n模型的0.681），各中医证候亚型的AUC范围为0.91～0.97。在各中医证候亚型中，轻度肾阳虚证的 mAP@50 最高（0.900），推理速度达到 89.00 FPS。Grad-CAM分析显示，模型将激活区域定位于边缘齿痕与舌根部局灶性苔质等中医关键病理特征，同时抑制非诊断性的口腔背景噪声。
结论 YD-YOLO 模型验证了利用深度学习进行中医阳虚证亚型精细化分类的可行性。将视觉表型量化与模型可解释性相结合，为中医辨证提供了客观参考，也为标准化数字化诊断系统的开发和中医临床决策支持提供了助力。

Abstract:
Objective To address the lack of fine-grained clinical recognition for specific Yang deficiency syndrome subtypes and the limitations of conventional object detection models in extracting irregular, low-contrast tongue phenotypes. This study aims to develop an objective subtype recognition framework based on an improved You Only Look Once nano (YOLO11n) architecture, using a standardized visual phenotype matrix to translate macroscopic traditional Chinese medicine (TCM) descriptions into quantifiable clinical targets.
Methods This cross-sectional diagnostic study consecutively enrolled adult inpatients admitted to the Department of Thyroid and Breast Surgery, The First Affiliated Hospital of Wannan Medical University (Yijishan Hospital), between September 1, 2024 and June 1, 2025, who were suspected of having Yang deficiency constitution based on initial TCM consultation. Clinical tongue image data were collected for analysis. Based on an Expert Visual Phenotype Annotation Matrix, a five-category recognition system was established, including the following TCM syndrome subtypes: spleen-dampness exuberance syndrome, mild kidney Yang deficiency syndrome, upper heat and lower cold syndrome, simultaneous Yin-Yang deficiency syndrome, and Yin deficiency and fluid depletion syndrome (negative control). The proposed Yang deficiency YOLO (YD-YOLO) model, built upon the YOLO11n baseline, integrates the Cross Stage Partial with kernel size 2 (C3k2)-GhostBottleneck-Dynamic Convolution (GBDC) module into the backbone to adaptively extract low-contrast features, and embeds the multipath aggregation coordinate attention (MACA) mechanism into the neck to suppress background interference through multi-scale spatial coordination. Gradient-weighted class activation mapping (Grad-CAM) was used to visualize feature attribution and evaluate the biological plausibility of the model’s focus. Model performance was evaluated through ablation and comparative experiments using mean average precision (mAP), precision, recall, F1 score, inference speed (frames per second, FPS), overall accuracy, Cohen’s kappa, and the area under the receiver operating characteristic (ROC) curve (AUC).
Results Based on the final inclusion of 1 186 clinical cases, the YD-YOLO model had an overall accuracy of 91.5%, a Cohen’s kappa of 0.912, and an mAP@50 of 0.731 higher than the YOLO11n baseline (0.681), with AUC ranging from 0.91 to 0.97 across all TCM syndrome subtypes. Among the TCM syndrome subtypes, the mild kidney Yang deficiency syndrome had the highest mAP@50 (0.900), and the inference speed reached 89.00 FPS. Grad-CAM analysis showed that the model localized activation to key TCM pathological features, such as marginal tooth marks and focal root coatings, while suppressing non-diagnostic oral background noise.
Conclusion The YD-YOLO model demonstrates the feasibility of deep learning for the fine-grained classification of TCM Yang deficiency subtypes. By integrating visual phenotype quantification with model interpretability, the proposed framework provides an objective basis for syndrome differentiation, supporting the development of standardized digital diagnostic systems and the provision of clinical decision support in TCM practice.

HTML全文

参考文献(43)

施引文献

资源附件(0)