Abstract:
Objective To address the lack of fine-grained clinical recognition for specific Yang deficiency syndrome subtypes and the limitations of conventional object detection models in extracting irregular, low-contrast tongue phenotypes. This study aims to develop an objective subtype recognition framework based on an improved You Only Look Once nano (YOLO11n) architecture, using a standardized visual phenotype matrix to translate macroscopic traditional Chinese medicine (TCM) descriptions into quantifiable clinical targets.
Methods This cross-sectional diagnostic study consecutively enrolled adult inpatients admitted to the Department of Thyroid and Breast Surgery, The First Affiliated Hospital of Wannan Medical University (Yijishan Hospital), between September 1, 2024 and June 1, 2025, who were suspected of having Yang deficiency constitution based on initial TCM consultation. Clinical tongue image data were collected for analysis. Based on an Expert Visual Phenotype Annotation Matrix, a five-category recognition system was established, including the following TCM syndrome subtypes: spleen-dampness exuberance syndrome, mild kidney Yang deficiency syndrome, upper heat and lower cold syndrome, simultaneous Yin-Yang deficiency syndrome, and Yin deficiency and fluid depletion syndrome (negative control). The proposed Yang deficiency YOLO (YD-YOLO) model, built upon the YOLO11n baseline, integrates the Cross Stage Partial with kernel size 2 (C3k2)-GhostBottleneck-Dynamic Convolution (GBDC) module into the backbone to adaptively extract low-contrast features, and embeds the multipath aggregation coordinate attention (MACA) mechanism into the neck to suppress background interference through multi-scale spatial coordination. Gradient-weighted class activation mapping (Grad-CAM) was used to visualize feature attribution and evaluate the biological plausibility of the model’s focus. Model performance was evaluated through ablation and comparative experiments using mean average precision (mAP), precision, recall, F1 score, inference speed (frames per second, FPS), overall accuracy, Cohen’s kappa, and the area under the receiver operating characteristic (ROC) curve (AUC).
Results Based on the final inclusion of 1 186 clinical cases, the YD-YOLO model had an overall accuracy of 91.5%, a Cohen’s kappa of 0.912, and an mAP@50 of 0.731 higher than the YOLO11n baseline (0.681), with AUC ranging from 0.91 to 0.97 across all TCM syndrome subtypes. Among the TCM syndrome subtypes, the mild kidney Yang deficiency syndrome had the highest mAP@50 (0.900), and the inference speed reached 89.00 FPS. Grad-CAM analysis showed that the model localized activation to key TCM pathological features, such as marginal tooth marks and focal root coatings, while suppressing non-diagnostic oral background noise.
Conclusion The YD-YOLO model demonstrates the feasibility of deep learning for the fine-grained classification of TCM Yang deficiency subtypes. By integrating visual phenotype quantification with model interpretability, the proposed framework provides an objective basis for syndrome differentiation, supporting the development of standardized digital diagnostic systems and the provision of clinical decision support in TCM practice.