Zing Forum

Reading

Autonomous LLM-Guided Disease Prediction System: Real-Time Matching and Surpassing CDC Expert-Ensembled Models

This article introduces an autonomous system using large language model (LLM)-guided tree search, which underwent prospective real-time evaluation during the 2025-2026 U.S. respiratory infectious disease season. The system autonomously discovers prediction models for influenza, COVID-19, and RSV, and its ensemble results match or surpass the gold standard curated manually by the CDC.

疾病预测大语言模型树搜索流行病学公共卫生自主系统集成模型
Published 2026-05-16 01:45Recent activity 2026-05-18 11:27Estimated read 6 min
Autonomous LLM-Guided Disease Prediction System: Real-Time Matching and Surpassing CDC Expert-Ensembled Models
1

Section 01

【Introduction】Autonomous LLM-Guided Disease Prediction System: Real-Time Surpassing of CDC Expert-Ensembled Models

This article introduces an autonomous disease prediction system using large language model (LLM)-guided tree search. The system underwent prospective real-time evaluation during the 2025-2026 U.S. respiratory infectious disease season, autonomously discovering prediction models for influenza, COVID-19, and RSV. Its ensemble results match or surpass the gold standard manually curated by the CDC, breaking through the labor bottleneck of traditional infectious disease prediction.

2

Section 02

Research Background: Labor Bottleneck in Infectious Disease Prediction

Probabilistic prediction of infectious diseases is crucial for public health decisions (e.g., resource allocation, vaccine planning). However, traditional methods rely on labor-intensive expert model curation: manual model development requires collaboration across multiple teams, has long iterative optimization cycles, and poor scalability (difficult to quickly respond to new pathogens or fine-grained regional predictions).

3

Section 03

System Approach: Autonomous Architecture of LLM-Guided Tree Search

The core architecture of the system consists of three parts: 1. LLM-guided code generation (exploring model architectures, generating executable code, translating epidemiological theories); 2. Tree search optimization (Monte Carlo Tree Search explores the code space and expands based on performance feedback); 3. Automatic judge mechanism (checking theoretical consistency, structural fidelity, and interpretability). The system can autonomously discover diverse models such as mechanistic models, statistical models, machine learning methods, and hybrid approaches.

4

Section 04

Prospective Evaluation: 2025-2026 Season Results Match CDC Gold Standard

Evaluation setup: During the 2025-2026 U.S. respiratory season, predicting the number of influenza, COVID-19, and RSV cases for the next 1-4 weeks, with real-time predictions generated weekly. Core results: The system's ensemble model consistently matches or surpasses the gold standard of CDC expert ensembles, and in cold-start scenarios with scarce RSV data, it maintains competitiveness through transfer learning and adjusting model complexity.

5

Section 05

Ablation Experiments: Validating the Effectiveness of Key Design Decisions

Ablation experiments validate key designs: 1. Log-scale distance metrics can prevent reward hacking (avoiding model fitting to extreme values); 2. The automatic judge mechanism ensures generated models comply with epidemiological principles, avoiding black-box models that perform well statistically but are scientifically unreasonable.

6

Section 06

Public Health Implications: Synergistic Enhancement of Automation and Specialization

Research implications: 1. Integration of automation and specialization (LLM explores model space + expert knowledge encoded into the judge mechanism); 2. Breaking through scalability bottlenecks (rapid deployment to new regions/pathogens, supporting fine-grained predictions); 3. Transparency and interpretability (generated code is readable, has clear theoretical support, and provides uncertainty quantification).

7

Section 07

Limitations, Future Directions, and Conclusion

Limitations: High computational cost, dependence on data quality, unproven generalization ability for extreme events, and ethical considerations. Future directions: Developing efficient search algorithms, integrating multi-modal data, establishing real-time feedback mechanisms, and optimizing human-machine collaboration interfaces. Conclusion: This system marks the transition of infectious disease prediction from manual development to automated intelligent systems, and AI will play an important role in the field of public health.