Zing Forum

Reading

Disease Progression Prediction System: A Machine Learning-Based Clinical Risk Assessment Tool

An end-to-end machine learning project that uses Random Forest and XGBoost algorithms to analyze patients' clinical health parameters, enabling real-time prediction of disease risk levels (low/medium/high) and featuring an interactive Streamlit web application.

机器学习医疗AI疾病预测随机森林XGBoostStreamlit风险评估心脏病
Published 2026-06-04 11:15Recent activity 2026-06-04 11:20Estimated read 5 min
Disease Progression Prediction System: A Machine Learning-Based Clinical Risk Assessment Tool
1

Section 01

[Main Floor/Introduction] Disease Progression Prediction System: A Machine Learning-Based Clinical Risk Assessment Tool

This is an end-to-end machine learning project named Disease Progression Predictor, published by Nidhi010805 on GitHub (link: https://github.com/Nidhi010805/Disease-Progression-Predictor). At its core, it uses Random Forest and XGBoost algorithms to analyze 13 clinical health parameters of patients, enabling real-time prediction of disease risk levels (low/medium/high). An interactive web application is built using Streamlit, aiming to address the subjective and time-consuming problems of traditional clinical risk assessment and assist in early risk identification and intervention for diseases such as heart disease.

2

Section 02

Project Background and Significance

Early disease prediction in the medical field is crucial for improving prognosis and resource allocation. Traditional assessment relies on doctors' experience and manual analysis, which is time-consuming and prone to subjective influences. As a leading cause of death globally, early risk identification for heart disease is particularly critical. This project uses machine learning to achieve automated and precise risk assessment, providing a time window for preventive interventions.

3

Section 03

System Architecture and Technology Stack

The project adopts a complete data science workflow, with a technology stack including: data processing (Pandas, NumPy), model training (Scikit-learn, XGBoost), front-end interaction (Streamlit), visualization (Matplotlib, Seaborn), and model persistence (Joblib). The Streamlit framework supports quickly converting models into interactive web applications.

4

Section 04

Core Algorithms and Model Performance

Three algorithms—Logistic Regression, Random Forest, and XGBoost—are compared. Test set results show that Random Forest and XGBoost have an accuracy rate of approximately 98.5%, significantly better than Logistic Regression's 79%. Tree-based ensemble models are more suitable for capturing non-linear relationships between features; XGBoost has advantages in training speed and memory usage, while Random Forest is slightly more interpretable.

5

Section 05

Input Features and Risk Classification

The input includes 13 clinical parameters (age, gender, chest pain type, resting blood pressure, cholesterol level, fasting blood glucose, etc.), selected based on medical knowledge. The output is divided into three risk levels: low (green), medium (orange), and high (red), and a probability score is provided to show prediction confidence.

6

Section 06

Interpretability and Clinical Application Value

The model enhances transparency through feature importance analysis. Chest pain type, maximum heart rate, and number of colored vessels are the most influential features, which align with medical common sense. Plans are in place to integrate SHAP value calculation to provide individualized explanations, helping doctors understand the reasons behind predictions and supporting human-machine collaborative decision-making.

7

Section 07

Deployment Scenarios and Future Outlook

The project has been deployed to Streamlit Cloud, and users can access it via a browser. Application scenarios include community health centers, physical examination centers, etc. Future plans: integrate SHAP explanations, expand the dataset, add user authentication, migrate to AWS/Azure to support large-scale concurrency, and gradually evolve into a production-level medical auxiliary tool.