Zing Forum

Reading

Kerala Ration Card Classifier: Optimizing Social Welfare Distribution with Machine Learning

An XGBoost-based machine learning system that automatically predicts five ration card categories in Kerala, India by analyzing socioeconomic features, integrated with OCR technology to enable automatic extraction of income proof information.

机器学习XGBoostOCR社会福利分类系统Streamlit印度
Published 2026-05-24 03:45Recent activity 2026-05-24 03:53Estimated read 7 min
Kerala Ration Card Classifier: Optimizing Social Welfare Distribution with Machine Learning
1

Section 01

Introduction to the Kerala Ration Card Classifier Project

This project is an XGBoost-based machine learning system aimed at automatically predicting five ration card categories in Kerala, India. It integrates OCR technology to extract income proof information and optimize the social welfare distribution process. Maintained by alnatony, the source code is available on GitHub (link: https://github.com/alnatony/RationCardTypeClassifier) and was released on May 23, 2026. Its core value lies in improving review efficiency, reducing human bias, and using technology to solve practical problems in social welfare distribution.

2

Section 02

Project Background and Social Significance of Ration Card Classification

In India, ration cards are important documents for accessing subsidized food. Kerala classifies them into five categories: AAY (yellow, most impoverished), PHH (pink, priority), NPS (blue, non-priority subsidized), NPI (brown, institutional residents), and NPNS (white, non-priority non-subsidized). Traditional manual review is time-consuming and prone to inconsistencies, requiring examination of multi-dimensional information such as income and employment. It lacks efficiency and accuracy when dealing with large numbers of applications. This project was created to address this issue.

3

Section 03

Technical Architecture and Implementation Methods

Core Algorithm: XGBoost was chosen for its advantages in structured data processing, interpretability, training efficiency, and accuracy. Feature Engineering: Predictions are based on multi-dimensional socioeconomic features, including income (total household income, sources, stability), employment (type, occupation, years of service), family structure (number of members, dependency ratio, special groups), and residence (region type, housing status). OCR Integration: Supports uploading scanned copies/photos of income proof. It uses OCR to recognize text and extract income figures, followed by rationality checks to simplify data entry. Web Interface: Built with Streamlit, providing functions such as form input, file upload, real-time prediction, and result explanation. Project Structure: Modular design (src/app, data, models, ocr, tests) for easy maintenance and expansion.

4

Section 04

Model Validation and Deployment Options

Validation Cases: The model correctly identifies NPI category corresponding to institutional residents with zero income, indicating it learned real patterns rather than random classification. Deployment Options: Supports Docker containerization, one-click deployment on Render cloud platform, and local Python environment execution.

5

Section 05

Technical Highlights and Best Practices

Data Privacy: Excludes datasets and model files via .gitignore to protect sensitive data, and provides training scripts for users to generate models based on their own data. Reproducibility: Provides complete dependency lists (requirements.txt, packages.txt) to ensure consistent environments, and training scripts to guarantee reproducible results. Test Coverage: The tests directory validates edge cases such as NPI to improve the reliability of classification tasks.

6

Section 06

Limitations and Improvement Directions

Current Limitations: Geographical limitation (trained only on Kerala data), data dependency (performance affected by data quality and completeness), OCR accuracy (affected by document quality). Improvement Directions: Multi-language OCR support, model integration (e.g., random forest + neural network voting), uncertainty quantification (confidence intervals + manual review of boundary cases), fairness audit (avoiding group discrimination).

7

Section 07

Social Value and Project Insights

Social Value: Improves administrative efficiency (reduces staff burden), reduces human bias (more consistent and transparent decisions), quickly responds to crisis needs (e.g., accelerating welfare distribution during the pandemic), and serves as an example of technology for inclusion (using mature technology to solve practical problems in developing countries). Conclusion: This project pragmatically chooses a mature tech stack to solve real problems, which is worth learning from. We look forward to more AI projects serving human well-being rather than just pursuing cutting-edge technology.