Reading

Open-Source Heart Disease Risk Prediction System: An Intelligent Medical Auxiliary Tool with Multi-Model Fusion

A heart disease risk prediction system based on deep learning and traditional machine learning, integrating three models—Artificial Neural Network, Random Forest, and Logistic Regression. It provides multi-level risk assessment, interpretability analysis, and integrates AI health recommendation functions.

心脏病预测深度学习医疗AI机器学习随机森林神经网络健康科技开源项目

Published 2026-06-04 01:14Recent activity 2026-06-04 01:18Estimated read 8 min

Section 01

[Introduction] Open-Source Heart Disease Risk Prediction System: An Intelligent Medical Auxiliary Tool with Multi-Model Fusion

Heart disease is one of the leading causes of death globally, and early accurate detection is crucial. This open-source project integrates three models: Artificial Neural Network (ANN), Random Forest, and Logistic Regression, providing multi-level risk assessment, SHAP interpretability analysis, and AI health recommendation functions. The project was released by zakriayousafzai on GitHub (link: https://github.com/zakriayousafzai/heart-disease-prediction-system) on June 3, 2026, offering multi-dimensional support for medical auxiliary decision-making.

Section 02

Background: Demand for Early Heart Disease Detection and Project Objectives

Heart disease is a leading cause of death worldwide, and early accurate detection can save millions of lives. This project aims to build a complete heart disease risk prediction system, integrating deep learning and traditional machine learning algorithms to provide multi-dimensional, interpretable risk assessment support for clinical decision-making and address the limitations of single models.

Section 03

Core Technologies: Multi-Model Fusion and Data Processing Solutions

Multi-Model Prediction Engine

Artificial Neural Network (ANN)：Built with PyTorch, it handles multi-class risk grading. It uses StandardScaler normalization, Dropout, and batch normalization to prevent overfitting, and a weighted loss function to address class imbalance.
Random Forest：Implemented with scikit-learn for binary classification detection. It reduces variance through multi-tree voting and outputs feature importance.
Logistic Regression：Serves as a baseline model, providing interpretable probability outputs and cross-validation with other models.

Data Processing

Covers physiological indicators such as age, gender, and blood pressure. Preprocessing includes KNN imputation for missing values, category encoding, and feature scaling (standardization for neural network inputs).

Section 04

System Architecture and Technology Stack Selection

Layered Architecture

Client Layer：Next.js frontend (React 19 + Tailwind CSS), Streamlit dashboard (data visualization)
API Service Layer：Flask REST API (port 5000, providing endpoints for health check, prediction, historical query, etc.)
Model Layer：Three models deployed in parallel with independent weight files, supporting hot reloading
Data Layer：PostgreSQL for storing patient records and prediction history

Technology Stack

Component	Technology	Version
Web Framework	Flask	3.1.2
ORM	Flask-SQLAlchemy	3.1.1
Deep Learning	PyTorch	2.10.0
Machine Learning	scikit-learn	1.8.0
Data Science	pandas, NumPy	3.0.0, 2.4.1
Interpretability	SHAP	0.45.0
AI Integration	Google Generative AI	-
Frontend Framework	Next.js	16.1.6
UI Library	React	19.2.3
Styling	Tailwind CSS	4.x

Section 05

Key Features: Interpretability and AI Health Recommendations

SHAP Model Interpretation

Introduces SHAP value calculation to provide feature-level contribution analysis, helping understand the basis of model decisions (e.g., which physiological indicators affect risk ratings) and enhancing transparency and credibility.

AI Health Recommendations

Integrates the Google Gemini API to generate personalized advice based on prediction results (diet adjustments, lifestyle improvements, medical consultation reminders), converting probabilities into actionable guidance.

Section 06

Deployment and Usage Guide

Environment Requirements

Python 3.10+
Node.js 18.x+
PostgreSQL 13+

Quick Start

Create and activate a virtual environment
Install dependencies: pip install -r requirements.txt
Configure environment variables (database connection, Gemini API key)
Verify model file integrity
Start the Flask service

Security Design

Sensitive configurations (database credentials, API keys) are stored in .env files and excluded from version control via .gitignore.

Section 07

Application Scenarios and Open-Source Value

The system applies to multiple scenarios:

Auxiliary diagnosis in primary medical institutions (initial assessment in resource-poor areas)
Health checkup data management (batch generation of risk reports)
Integration with chronic disease management platforms (enhancing existing system functions)
Medical teaching and research (demonstrating multi-model fusion applications)

Open-source features allow customization: connecting electronic medical records, adapting to specific population data, and integrating more models.

Section 08

Summary and Outlook

This system balances prediction accuracy and user experience through multi-model fusion, interpretability analysis, and AI recommendations. It provides a complete reference for medical AI developers (from data processing to deployment), especially useful in addressing class imbalance, interpretability, and production configuration. With future data accumulation and algorithm iteration, it is expected to play a greater role in disease prevention and early intervention.