Zing Forum

Reading

AI-Powered Credit Risk Assessment Platform: From Predictive Models to Explainable Intelligence

An end-to-end machine learning credit risk assessment system that combines LightGBM/XGBoost prediction, SHAP explainable AI, and natural language querying to provide a complete intelligent solution for bank credit decision-making.

credit riskmachine learningLightGBMXGBoostSHAPexplainable AIfintechrisk assessmentnatural language querybanking
Published 2026-06-02 11:15Recent activity 2026-06-02 11:20Estimated read 7 min
AI-Powered Credit Risk Assessment Platform: From Predictive Models to Explainable Intelligence
1

Section 01

Introduction: Core Value of the AI-Powered Credit Risk Assessment Platform

This AI-powered credit risk assessment platform is an end-to-end solution for banking scenarios, integrating LightGBM/XGBoost predictive models, SHAP explainable AI technology, and natural language query functions. The project achieves one-click deployment through decoupled architecture design and Docker containerization, while combining asymmetric cost-benefit modeling to ensure model decisions align with banks' risk tolerance and meet compliance audit requirements.

2

Section 02

Project Background and Architecture Design

Original Author & Source

  • Author/Maintainer: deva1702
  • Source Platform: GitHub
  • Original Title: credit_risk_model
  • Release Date: June 2, 2026

Project Overview An end-to-end AI risk assessment platform for bank credit scenarios, integrating machine learning prediction, explainable AI, and natural language interaction. It adopts a decoupled architecture (separation of machine learning, data engineering, conversational AI, and presentation layers) and supports one-click deployment via Docker containerization.

System Architecture Four-layer architecture:

  1. Client Layer: Streamlit Web Interface
  2. Intelligence & Agent Layer: Core components include Groq LLaMA-3.3-70B (natural language understanding), NL-to-SQL translator, SHAP interpreter, inference engine (loads LightGBM/XGBoost models)
  3. Data & Persistence Layer: SQLite database, CSV data sources, pre-trained model files
3

Section 03

Predictive Model and Method Design

Predictive Model & Feature Engineering

  • Core Models: LightGBM and XGBoost
  • Data Cleaning: Filter columns with missing rate >40% (except TARGET and EXT_SOURCE scores); label encoding for categorical features, with unknown labels falling back to "Missing"
  • Domain Features: Designed financial health indicators such as CREDIT_INCOME_RATIO, ANNUITY_INCOME_RATIO, DEBT_SERVICE_RATIO, and CREDIT_STRESS
  • Class Imbalance Handling: Adopt class weight adjustment (scale_pos_weight=5) to improve recall rate for high-risk applicants

Asymmetric Cost-Benefit Modeling

  • False Negative Cost (approving defaulting applicants): 60% of loan principal (LGD)
  • False Positive Cost (rejecting good borrowers): 10% of loan principal (NIM)
  • Decision threshold set to 0.30 (instead of 0.5) to optimize business costs
4

Section 04

Model Performance and Evidence Support

Model Performance Evaluation Adopt 80/20 stratified training validation, KS statistic to evaluate separation:

Metric LightGBM XGBoost Winner
ROC-AUC 0.7673 0.7649 LightGBM
PR-AUC 0.2608 0.2578 LightGBM
KS Statistic 0.4089 0.4016 LightGBM

Explainable AI

  • Global Explanation: Beeswarm plots and average impact plots show that EXT_SOURCE_2/3 dominate global risk signals
  • Individual Explanation: Dynamic factor cards (color-coded feature impacts), waterfall charts (feature adjustment process)

Natural Language Query Supports pure English dataset exploration, ensures SQL accuracy via schema injection prompts, and has a hallucination fallback mechanism

5

Section 05

Project Conclusions and Technical Highlights

Technical Highlights

  1. Domain Knowledge Integration: Integrate financial theory into professional features like CREDIT_STRESS
  2. Business Goal Alignment: Asymmetric cost modeling optimizes actual business metrics
  3. Explainability Priority: SHAP integration meets regulatory and user trust requirements
  4. Natural Language Interaction: Reduces the analysis barrier for non-technical users
  5. Engineering Mindset: Layered architecture and containerization ensure maintainability

Project Insights Provides a reference for AI applications in the financial sector, demonstrating how to transform lab prototypes into production-grade banking solutions

6

Section 06

Deployment and Usage Recommendations

Deployment Steps

  1. Clone the repository: git clone <repository> && cd credit_risk_model
  2. Configure environment variables: Create a .env file and set GROQ_API_KEY, DATA_PATH, MODEL_PATH, DB_PATH, ACTIVE_MODEL
  3. Start the container: docker-compose up
  4. Access: http://localhost:9200