Abstract:
Objective To map the research landscape of artificial intelligence (AI)-assisted tongue diagnosis through bibliometric analysis and to quantify its diagnostic accuracy and clinical interpretability through a diagnostic test accuracy (DTA) meta-analysis.
Methods For the bibliometric analysis, the Web of Science Core Collection (WoSCC) was queried for English-language articles and reviews on AI-assisted tongue diagnosis published between January 1, 2014 and December 31, 2025, and analysed using Bibliometrix, VOSviewer, and CiteSpace, with major output dimensions including annual publication output and disciplinary distribution, journal and citation characteristics, country/region and institutional collaboration, author networks, keyword co-occurrence, and keyword burst detection. For the DTA meta-analysis, four databases Scopus, PubMed, Web of Science, and China National Knowledge Infrastructure (CNKI) were searched in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy (PRISMA-DTA) guidelines. A bivariate random-effects model hierarchical summary receiver operating characteristic (HSROC) was used to pool sensitivity and specificity, with subgroup analyses by disease category, AI model architecture, and sample-size strata. Methodological quality was assessed with the Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) tool, and publication bias was evaluated by Deeks’ funnel plot asymmetry test.
Results A total of 198 publications met the bibliometric eligibility criteria. Annual output increased 24.5-fold (from 2 in 2014 to 49 in 2025), with the period 2022 – 2025 alone accounting for 65.2% of all publications. China contributed approximately 83.5% of all institutional affiliations, with Shanghai University of Traditional Chinese Medicine and Jiatuo Xu being the most productive institution and author, respectively. Keyword analysis identified four thematic clusters (AI and deep-learning architectures, image processing and segmentation, traditional Chinese medicine (TCM)-specific applications, and disease-specific applications) and a temporal evolution from traditional machine learning to deep learning and transformer-based, explainable, and multimodal AI architectures. Sixteen DTA meta-analysis studies (14 755 participants) covering metabolic and hepatic disorders, oncological and oral lesions, cardiovascular risk, diabetes, and other clinical applications were included in the DTA meta-analysis. The pooled sensitivity was 90.3% 95% confidence interval (CI): 86.7% – 93.1% and the pooled specificity was 93.0% (95% CI: 90.6% – 94.7%); the area under the summary receiver operating characteristic (SROC) curve (AUC) was 0.961. Heterogeneity was substantial (I2 = 95.8% for sensitivity; I2 = 92.1% for specificity). Subgroup performance was broadly consistent across disease categories, AI architectures, and sample-size strata, and Deeks’ test indicated no significant publication bias (P = 0.258).
Conclusion AI-assisted tongue diagnosis has progressed rapidly and shows pooled diagnostic performance comparable to established screening modalities, supporting its potential as a complementary and easily accessible decision-support tool.