Zing Forum

Reading

Computational Phenotyping Framework for Fibromyalgia: Multicenter Machine Learning and Markov Trajectory Modeling

A computational framework combining unsupervised clustering, continuous-time multi-state Markov models, and Andersen-Gill recurrent event analysis for longitudinal trajectory modeling and clinical intervention evaluation of fibromyalgia.

纤维肌痛症机器学习马尔可夫模型生存分析表型聚类真实世界数据Python医疗AI
Published 2026-06-03 20:15Recent activity 2026-06-03 20:18Estimated read 7 min
Computational Phenotyping Framework for Fibromyalgia: Multicenter Machine Learning and Markov Trajectory Modeling
1

Section 01

[Main Post/Introduction] Computational Phenotyping Framework for Fibromyalgia: Multicenter Machine Learning and Markov Trajectory Modeling

Core Content: This project proposes a computational framework combining unsupervised clustering, continuous-time multi-state Markov models, and Andersen-Gill recurrent event analysis for longitudinal trajectory modeling and clinical intervention evaluation of fibromyalgia.

Original Author/Maintainer: nazariofelix-CTB Source Platform: GitHub Original Link: https://github.com/nazariofelix-CTB/fibromyalgia-phenotyping-markov Publication Date: June 3, 2026

2

Section 02

Research Background and Clinical Challenges

Fibromyalgia is a complex chronic pain syndrome with highly heterogeneous clinical features and significant individual differences in disease progression. Traditional studies treat patients as a homogeneous group, ignoring diversity and dynamic evolution, which limits the application of precision medicine.

Core scientific problem addressed by this framework: How to identify similar phenotypic subgroups in real-world data, model the dynamic relationship between disease trajectories and medical interventions, and provide support for personalized treatment.

3

Section 03

Core Methodology of the Framework

The framework integrates three complementary methods:

  1. Unsupervised longitudinal phenotypic clustering: Uses geometric k-means, with a cross-cohort strategy of training on Center A and projecting on Center B to prevent data leakage, ensuring clustering repeatability and validity;
  2. Continuous-time multi-state Markov model: Abstracted into three states (mild/maintenance phase, severe, extremely severe), estimates the instantaneous transition rate matrix (Q matrix), and captures state transitions on a real-time scale;
  3. Andersen-Gill recurrent event survival analysis: Treats medical re-intervention as a recurrent event, models it independently of disease state transitions, distinguishes between natural progression and intervention effects, and provides a clear causal framework.
4

Section 04

Technical Implementation and Code Structure

Technical Implementation:

  • Development Environment: Python 3.8+
  • Dependency Stack: pandas&numpy (data collation and computation), scikit-learn (clustering and validation), scipy (intensity matrix computation), lifelines (recurrent event estimation), matplotlib&seaborn (visualization)
  • Notebook Workflow:
  1. 01_Data_Curation_Longitudinal.ipynb: Parses clinical records, processes irregular follow-ups, and maps to a transaction dataset indexed by baseline days;
  2. 02_Pipeline_Longitudinal.ipynb: Performs cohort segmentation, clustering projection, MSM fitting, AG model estimation, and outputs validation metrics and visualization results.
5

Section 05

Validation Results and Clinical Significance

Validation Results: The C-index of the external validation cohort reached 0.742, indicating good predictive discrimination ability. Seven validation products were generated:

  • Real transition frequency and observation window
  • Continuous-time transition intensity rate matrix
  • Expected probability trajectory from 0 to 360 days
  • Covariate hazard ratio and confidence interval
  • Frozen geometric anchor points and scaling coefficients
  • Clustering validity metrics (Silhouette Coefficient, Davies-Bouldin Index, Calinski-Harabasz Index)
  • Population filtering log for STROBE flow chart

Clinical Significance: Supports paper figures and tables, and provides a complete reproduction path for researchers.

6

Section 06

Data Ethics and Privacy Protection

Data processing strictly follows HIPAA and EU GDPR standards, and patient records are de-identified: institutional identifiers, calendar dates, and personal primary keys are removed/replaced with encrypted secure universal tokens (PATIENT_XXXX), preserving data variability while eliminating the risk of re-identification.

7

Section 07

Research Implications and Future Directions

Research Implications: The framework demonstrates the organic combination of machine learning, stochastic process modeling, and survival analysis, which can be extended to heterogeneous chronic diseases such as rheumatoid arthritis and multiple sclerosis.

Future Directions: Provides a complete technical blueprint from data collation to model validation for medical AI developers, especially the design ideas for handling irregular longitudinal data and preventing cross-cohort data leakage are worth learning from.