# Machine Learning for Predicting Medication Adherence in Diabetic Patients: A Practice Using Zimbabwean Healthcare Data

> Based on real-world data from Zimbabwe's Cimas Medical Insurance Company, this study builds classical machine learning models to predict medication adherence in patients with diabetes and hypertension. Through feature group comparison experiments and clinical cost-sensitive evaluation, it provides data-driven intervention strategies for non-communicable disease (NCD) management in sub-Saharan Africa.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-06T10:16:01.000Z
- 最近活动: 2026-06-06T10:26:26.968Z
- 热度: 154.8
- 关键词: 机器学习, 医疗AI, 用药依从性, 糖尿病, 高血压, 撒哈拉以南非洲, 健康数据科学, XGBoost, SHAP可解释性, 成本敏感学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-jeremy-k-coder-diabetes-hypertension-medication-adherence
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-jeremy-k-coder-diabetes-hypertension-medication-adherence
- Markdown 来源: floors_fallback

---

## Introduction to Machine Learning for Predicting Medication Adherence in Diabetic Patients: A Practice Using Zimbabwean Healthcare Data

This project uses real-world data from Zimbabwe's Cimas Medical Insurance Company to build classical machine learning models for predicting medication adherence in patients with diabetes and hypertension. Through feature group comparison experiments, clinical cost-sensitive evaluation, and SHAP interpretability analysis, it provides data-driven intervention strategies for non-communicable disease (NCD) management in sub-Saharan Africa. Core objectives include verifying the predictive value of pharmacy refill and insurance data, analyzing the role of socioeconomic and clinical consumption features, identifying key predictive factors, and optimizing model performance.

## NCD Crisis and Medication Adherence Challenges in Sub-Saharan Africa

Sub-Saharan Africa faces a double burden of uncontrolled infectious diseases and rapidly rising NCDs. The International Diabetes Federation predicts that the prevalence of diabetes in Africa will increase by 129% by 2045. Hypertension affects about 30% of adults in the region, but its awareness and treatment rates are the lowest globally. Structural barriers in Zimbabwe (shortage of specialists, uneven access to drugs, fragmented insurance, high out-of-pocket costs) exacerbate this burden. The costs of medication non-adherence are significant: clinically, it leads to complications such as retinopathy and nephropathy; economically, hospitalization costs are 3-5 times higher than medication costs; systemically, it consumes scarce medical resources.

## Dataset Features and Innovative Derived Metrics

The project uses public data from Cimas Medical Aid Society covering approximately 8141 patients from January to December 2022 (source: Mendeley Data). Adherence is defined as MPR ≥75% (adherent) vs <75% (non-adherent). Innovative derived features include: cost burden ratio (annual claims/premium amount), refill interval days, refill regularity (interval standard deviation), number of units per refill, comorbidity markers, insurance tiers (basic/standard/premium), etc.

## Feature Group Experiments and Machine Learning Workflow

Feature group experiments are divided into three groups: Group A (socioeconomic features: insurance tier, cost burden, etc.), Group B (clinical consumption features: refill interval, regularity, etc.), Group C (combined features). The machine learning workflow includes: preprocessing (standardization/encoding), SMOTE for class imbalance handling, 70/15/15 stratified split, classifiers (logistic regression, XGBoost, etc.), RandomizedSearchCV tuning (with macro F1 as the target). Cost-sensitive evaluation is introduced, with heavier penalties for false negatives (missed non-adherence cases).

## Experimental Results and Feature Contribution Analysis

Among baseline models, XGBoost and Random Forest performed best. Feature group comparison showed: Group B (clinical consumption features) had performance close to the full model; Group A (socioeconomic features) had supplementary value; Group C (combined) had the optimal performance. SHAP analysis provides interpretability at the global level (key features), local level (individual patient explanations), and feature interactions.

## Clinical Practice and Policy Implications

Socioeconomic features can help community health workers identify high-risk patients when pharmacy data is unavailable. The cost-sensitive framework balances model performance and clinical safety. Targeted intervention strategies include: providing financial assistance to patients with high cost burdens, implementing reminder systems for patients with irregular refills, and enhancing education for patients with comorbidities.

## Limitations and Ethical Considerations

The dataset is limited to insured urban populations in Harare and may not be generalizable to rural/informal sectors. The model is an academic prototype and requires prospective validation. Ethically: the data is de-identified, sourced from a CC0-licensed repository, and contains no patient identity information. Deployment requires stakeholder participation and transparent communication of limitations.
