# LLM-Driven Temporal Causal Inference: Solving Medical Data Missingness Challenges with LLM-Empowered Evolutionary Imputation

> This article introduces a two-stage framework combining DAG-constrained normalizing flows and LLM-driven evolutionary imputation for estimating treatment effects from incomplete longitudinal electronic health records (EHRs). The method maintains the accuracy of causal effect estimation even with 30%-80% missingness rates and has been validated on real-world diabetes treatment data.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-06T16:53:08.000Z
- 最近活动: 2026-05-07T03:21:42.038Z
- 热度: 140.5
- 关键词: 因果推断, 大语言模型, 医疗数据, 缺失值插补, 电子健康记录, 目标试验模拟, 归一化流, DAG约束
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-e9d9ec0e
- Canonical: https://www.zingnex.cn/forum/thread/llm-e9d9ec0e
- Markdown 来源: floors_fallback

---

## [Main Floor] LLM-Driven Temporal Causal Inference: A New Framework for Solving Medical Data Missingness Challenges

This article proposes a two-stage framework combining DAG-constrained normalizing flows (CausalFlow-T) and LLM-driven evolutionary imputation for estimating treatment effects from incomplete longitudinal electronic health records (EHRs). The method maintains the accuracy of causal effect estimation even with 30%-80% missingness rates and has been validated on real-world diabetes treatment data.

## Research Background and Challenges

In medical research, randomized controlled trials (RCTs) are the gold standard for causal inference, but they face ethical or feasibility barriers. Target Trial Emulation (TTE) uses observational data to answer causal questions, but existing methods often separate causal estimation, missing value handling, and temporal structure modeling, leading to insufficient robustness in electronic health records (EHRs). EHR data missingness is severe: time-varying confounders and non-randomly missing (MNAR) biomarkers have missing rates as high as 50%-80%. Traditional imputation methods struggle to capture complex generation mechanisms; simple handling easily introduces bias. Maintaining the accuracy of causal estimation under high missingness rates is a key challenge in medical AI.

## Core Method Stage 1: CausalFlow-T (DAG-Constrained Temporal Normalizing Flow)

The research team proposes CausalFlow-T, a normalizing flow model combined with directed acyclic graph (DAG) constraints. This model uses long short-term memory (LSTM) networks to encode patients' historical trajectories, enabling precise reversible counterfactual inference, avoiding approximation errors of traditional variational inference methods, and separating confounders through explicit causal structures. Ablation experiments on synthetic datasets and semi-synthetic benchmarks verify that DAG constraints and precise inference address different failure modes and are irreplaceable.

## Core Method Stage 2: LLM-Driven Evolutionary Imputation

Since CausalFlow-T requires complete input data, the team introduces an LLM-driven evolutionary imputation method. Unlike traditional methods that directly predict missing values, this approach allows LLMs to propose executable imputation operators instead of individual numerical values. Advantages include: aligning with medical data generation logic (e.g., inferring HbA1c trends based on age and medical history), iterative optimization via evolutionary search, and flexible adaptation to different LLM backends (the study tested three configurations including two open-source models).

## Experimental Validation and Performance

**Synthetic Benchmark Testing**: Under 30%-80% MNAR missingness rates, the LLM-driven imputer ranks first in comprehensive biomarkers and causal metrics, leading in point estimation accuracy and temporal extrapolation ability, while maintaining average treatment effect (ATE) recovery capability. Traditional statistical baselines show significant performance degradation.

**Real-World Validation**: In Swiss primary care EHR data, for a study of type 2 diabetes patients taking GLP-1 receptor agonists or SGLT-2 inhibitors, the GLP-1 receptor agonist was estimated to have a -0.98 kg advantage in weight loss (95% confidence interval: -1.01 to -0.96), consistent with RCT results.

## Technical Insights and Application Prospects

**Methodological Contributions**: Demonstrates the innovative application of LLMs in structured medical data imputation, positioning LLMs as 'strategy generators' rather than filling tools. Operator-level imputation is more interpretable and domain-adaptive.

**Implications for Medical AI**: Provides a new path for real-world evidence (RWE) research, allowing retention of more samples while controlling missingness bias and improving the external validity of observational studies.

## Limitations and Future Directions

Current limitations: High computational cost (additional overhead from evolutionary search and LLM inference). Future directions: Explore more efficient operator search strategies, integrate medical domain pre-trained LLMs to improve imputation quality, and extend to other disease areas and more medical data types (e.g., imaging, genomics).
