TCMLCM:基于 KG2TRAG 方法的中医肺癌智能问答模型

TCMLCM: an intelligent question-answering model for traditional Chinese medicine lung cancer based on the KG2TRAG method

  • 摘要:
    目的 利用从知识图谱到文本增强的检索增强生成(KG2TRAG)的方法将大型语言模型与结构化知识图相结合,提高中医肺癌问答模型的准确性和专业性。
    方法 通过在Tianchi TCM、HuangDi 和 ShenNong-TCM-Dataset 数据集以及中医肺癌知识图谱上对ChatGLM2-6B进行微调,构建了中医肺癌模型(TCMLCM)。为增强知识检索能力,引入KG2TRAG方法借助 ChatGPT 辅助线性化将知识图谱三元组转换为自然语言文本,并利用大型语言模型进行上下文感知推理。为了进行全面比较,选择MedicalGPT、HuatuoGPT和BenTsao作为基线模型。使用双语评估替代(BLEU)、面向召回的自动摘要评估(ROUGE)、准确率以及领域特定的TCM-LCEval指标对性能进行评估,并由中医肿瘤学专家对答案的准确性、专业性和可用性进行验证。
    结果 TCMLCM 模型在所有指标中均取得最优性能,其中 BLEU 得分为 32.15%,ROUGE-L 为 59.08%,准确率为 79.68%。值得注意的是,在针对中医领域的 TCM-LCEval 评估中,其性能比基线模型高 3%~12%。专家评估显示,在准确性和专业性方面表现卓越。
    结论 TCMLCM为中医肺癌问答提供了一种创新的解决方案,证明了将结构化的 知识图谱 与大型语言模型相结合的可行性。这项工作推动了中医智能医疗工具的发展,并为传统医学中未来的AI驱动应用奠定了基础。

     

    Abstract:
    Objective To improve the accuracy and professionalism of question-answering (QA) model in traditional Chinese medicine (TCM) lung cancer by integrating large language models with structured knowledge graphs using the knowledge graph (KG) to text-enhanced retrieval-augmented generation (KG2TRAG) method.
    Methods The TCM lung cancer model (TCMLCM) was constructed by fine-tuning ChatGLM2-6B on the specialized datasets Tianchi TCM, HuangDi, and ShenNong-TCM-Dataset, as well as a TCM lung cancer KG. The KG2TRAG method was applied to enhance the knowledge retrieval, which can convert KG triples into natural language text via ChatGPT-aided linearization, leveraging large language models (LLMs) for context-aware reasoning. For a comprehensive comparison, MedicalGPT, HuatuoGPT, and BenTsao were selected as the baseline models. Performance was evaluated using bilingual evaluation understudy (BLEU), recall-oriented understudy for gisting evaluation (ROUGE), accuracy, and the domain-specific TCM-LCEval metrics, with validation from TCM oncology experts assessing answer accuracy, professionalism, and usability.
    Results The TCMLCM model achieved the optimal performance across all metrics, including a BLEU score of 32.15%, ROUGE-L of 59.08%, and an accuracy rate of 79.68%. Notably, in the TCM-LCEval assessment specific to the field of TCM, its performance was 3% − 12% higher than that of the baseline model. Expert evaluations highlighted superior performance in accuracy and professionalism.
    Conclusion TCMLCM can provide an innovative solution for TCM lung cancer QA, demonstrating the feasibility of integrating structured KGs with LLMs. This work advances intelligent TCM healthcare tools and lays a foundation for future AI-driven applications in traditional medicine.

     

/

返回文章
返回