Zing Forum

Reading

LLM-Driven Temporal Causal Inference: Solving Medical Data Missingness Challenges with LLM-Empowered Evolutionary Imputation

This article introduces a two-stage framework combining DAG-constrained normalizing flows and LLM-driven evolutionary imputation for estimating treatment effects from incomplete longitudinal electronic health records (EHRs). The method maintains the accuracy of causal effect estimation even with 30%-80% missingness rates and has been validated on real-world diabetes treatment data.

因果推断大语言模型医疗数据缺失值插补电子健康记录目标试验模拟归一化流DAG约束
Published 2026-05-07 00:53Recent activity 2026-05-07 11:21Estimated read 7 min
LLM-Driven Temporal Causal Inference: Solving Medical Data Missingness Challenges with LLM-Empowered Evolutionary Imputation
1

Section 01

[Main Floor] LLM-Driven Temporal Causal Inference: A New Framework for Solving Medical Data Missingness Challenges

This article proposes a two-stage framework combining DAG-constrained normalizing flows (CausalFlow-T) and LLM-driven evolutionary imputation for estimating treatment effects from incomplete longitudinal electronic health records (EHRs). The method maintains the accuracy of causal effect estimation even with 30%-80% missingness rates and has been validated on real-world diabetes treatment data.

2

Section 02

Research Background and Challenges

In medical research, randomized controlled trials (RCTs) are the gold standard for causal inference, but they face ethical or feasibility barriers. Target Trial Emulation (TTE) uses observational data to answer causal questions, but existing methods often separate causal estimation, missing value handling, and temporal structure modeling, leading to insufficient robustness in electronic health records (EHRs). EHR data missingness is severe: time-varying confounders and non-randomly missing (MNAR) biomarkers have missing rates as high as 50%-80%. Traditional imputation methods struggle to capture complex generation mechanisms; simple handling easily introduces bias. Maintaining the accuracy of causal estimation under high missingness rates is a key challenge in medical AI.

3

Section 03

Core Method Stage 1: CausalFlow-T (DAG-Constrained Temporal Normalizing Flow)

The research team proposes CausalFlow-T, a normalizing flow model combined with directed acyclic graph (DAG) constraints. This model uses long short-term memory (LSTM) networks to encode patients' historical trajectories, enabling precise reversible counterfactual inference, avoiding approximation errors of traditional variational inference methods, and separating confounders through explicit causal structures. Ablation experiments on synthetic datasets and semi-synthetic benchmarks verify that DAG constraints and precise inference address different failure modes and are irreplaceable.

4

Section 04

Core Method Stage 2: LLM-Driven Evolutionary Imputation

Since CausalFlow-T requires complete input data, the team introduces an LLM-driven evolutionary imputation method. Unlike traditional methods that directly predict missing values, this approach allows LLMs to propose executable imputation operators instead of individual numerical values. Advantages include: aligning with medical data generation logic (e.g., inferring HbA1c trends based on age and medical history), iterative optimization via evolutionary search, and flexible adaptation to different LLM backends (the study tested three configurations including two open-source models).

5

Section 05

Experimental Validation and Performance

Synthetic Benchmark Testing: Under 30%-80% MNAR missingness rates, the LLM-driven imputer ranks first in comprehensive biomarkers and causal metrics, leading in point estimation accuracy and temporal extrapolation ability, while maintaining average treatment effect (ATE) recovery capability. Traditional statistical baselines show significant performance degradation.

Real-World Validation: In Swiss primary care EHR data, for a study of type 2 diabetes patients taking GLP-1 receptor agonists or SGLT-2 inhibitors, the GLP-1 receptor agonist was estimated to have a -0.98 kg advantage in weight loss (95% confidence interval: -1.01 to -0.96), consistent with RCT results.

6

Section 06

Technical Insights and Application Prospects

Methodological Contributions: Demonstrates the innovative application of LLMs in structured medical data imputation, positioning LLMs as 'strategy generators' rather than filling tools. Operator-level imputation is more interpretable and domain-adaptive.

Implications for Medical AI: Provides a new path for real-world evidence (RWE) research, allowing retention of more samples while controlling missingness bias and improving the external validity of observational studies.

7

Section 07

Limitations and Future Directions

Current limitations: High computational cost (additional overhead from evolutionary search and LLM inference). Future directions: Explore more efficient operator search strategies, integrate medical domain pre-trained LLMs to improve imputation quality, and extend to other disease areas and more medical data types (e.g., imaging, genomics).