Zing Forum

Reading

BiPharm-RAG: Cross-Source Dual Hypergraph Retrieval-Augmented Large Language Model for TCM Diagnosis and Treatment Reasoning

This article introduces the BiPharm-RAG project, which applies large language models to TCM diagnosis and treatment reasoning through cross-source dual hypergraph retrieval-augmented technology, enabling intelligent integration of multi-source heterogeneous TCM knowledge.

中医大语言模型RAG检索增强生成超图知识图谱诊疗推理中医药
Published 2026-04-30 23:43Recent activity 2026-04-30 23:48Estimated read 8 min
BiPharm-RAG: Cross-Source Dual Hypergraph Retrieval-Augmented Large Language Model for TCM Diagnosis and Treatment Reasoning
1

Section 01

[Introduction] BiPharm-RAG: Cross-Source Dual Hypergraph Retrieval-Augmented Large Model Empowers TCM Diagnosis and Treatment Reasoning

This article introduces the BiPharm-RAG project. Addressing challenges such as fragmented TCM knowledge, strong experience dependence, and low standardization, it innovatively proposes a cross-source dual hypergraph retrieval-augmented architecture. By applying large language models to TCM diagnosis and treatment reasoning, it achieves intelligent integration of multi-source heterogeneous TCM knowledge, effectively alleviating the knowledge hallucination problem of large models and providing support for TCM auxiliary diagnosis and treatment, knowledge education, and new drug research and development.

2

Section 02

Background: Core Challenges Faced by TCM Knowledge Engineering

The TCM knowledge system has unique complexity: 1. Diverse knowledge sources (classical ancient books, modern clinical research, medicinal material databases, etc.) with heterogeneous formats and lack of unified semantic representation; 2. Complex conceptual associations (e.g., medicinal materials and their effects, prescriptions and diseases) that are difficult to express with simple graphs; 3. Diagnosis and treatment emphasize syndrome differentiation and treatment, requiring integration of multiple factors—traditional keyword retrieval cannot capture deep semantic associations, and pure large models are prone to knowledge hallucination.

3

Section 03

BiPharm-RAG Architecture Innovation: Cross-Source Dual Hypergraph and Retrieval-Augmented Generation

Cross-Source Knowledge Integration

Integrate multi-source data such as TCM ancient books, modern journals, medicinal material databases, and clinical cases. Achieve joint retrieval through a unified knowledge framework to discover associations that are difficult to obtain from a single data source (e.g., verification between traditional classics and modern pharmacology).

Dual Hypergraph Knowledge Representation

Adopt a dual hypergraph structure: the concept-level hypergraph models semantic associations (e.g., medicinal material categories, efficacy classifications), while the instance-level hypergraph records specific relationships (e.g., prescription composition, clinical case features), balancing abstract generalization and detail preservation.

Retrieval-Augmented Generation Mechanism

When receiving a diagnosis and treatment query, retrieve relevant knowledge subgraphs through dual hypergraph multi-hop reasoning, input structured knowledge fragments into the large model to generate results, ensuring traceable and hallucination-free outputs.

4

Section 04

Key Technical Implementation Points: Knowledge Extraction, Hypergraph Embedding, and Model Adaptation

Knowledge Extraction and Graph Construction

Extract structured knowledge from multi-source data, involving tasks such as named entity recognition and relation extraction. Domain-specific pre-trained models or fine-tuning strategies may be used to address the professionalism and ambiguity of TCM terminology.

Hypergraph Embedding and Similarity Calculation

Embed the hypergraph structure into a low-dimensional vector space to capture the propagation characteristics of high-order relationships, and quickly calculate the semantic similarity between queries and knowledge fragments based on embedded vectors.

Large Language Model Adaptation

Perform domain adaptation on large models, including domain pre-training, instruction fine-tuning, or prompt engineering optimization, to ensure accurate understanding of TCM syndrome differentiation logic and generate reasoning results consistent with TCM thinking.

5

Section 05

Application Scenarios: Clinical Assistance, Knowledge Education, and New Drug R&D

TCM Clinical Decision Support

Assist TCM practitioners in quickly retrieving classic prescriptions, similar clinical cases, and medicinal material knowledge, improving diagnosis and treatment accuracy and efficiency (not a substitute for doctor's judgment).

TCM Knowledge Education

Provide intelligent query tools for learners, returning answers while displaying knowledge sources and reasoning paths to help build a systematic knowledge framework.

New Drug R&D and Prescription Optimization

Analyze the synergistic mechanisms of medicinal materials and clinical data to provide potential drug combination suggestions for researchers.

6

Section 06

Conclusion and Outlook: The Path of Integration Between Traditional Medicine and Modern Technology

BiPharm-RAG demonstrates that vertical domain RAG systems require deep domain adaptation (knowledge representation, retrieval strategies, model tuning). In the future, multi-modal information such as tongue images and pulse conditions can be integrated to achieve more comprehensive intelligent diagnosis and treatment; the dual hypergraph architecture also provides reference for complex knowledge domains such as law and finance. This project is an important exploration of the integration of traditional medical wisdom and modern technology, providing technical references for the modernization of TCM.