Zing Forum

Reading

FraudShield: A Real-Time Financial Fraud Detection System Integrating ML, RAG, and LLM

A financial transaction monitoring system born from a three-day hackathon, which achieves real-time fraud detection and interpretable analysis through a three-layer architecture combining Isolation Forest, RAG retrieval, and the Gemini large language model.

fraud detectionisolation forestRAGLLMfinancial securityreal-timefastapireact
Published 2026-06-05 00:44Recent activity 2026-06-05 00:50Estimated read 5 min
FraudShield: A Real-Time Financial Fraud Detection System Integrating ML, RAG, and LLM
1

Section 01

Introduction / Main Floor: FraudShield: A Real-Time Financial Fraud Detection System Integrating ML, RAG, and LLM

A financial transaction monitoring system born from a three-day hackathon, which achieves real-time fraud detection and interpretable analysis through a three-layer architecture combining Isolation Forest, RAG retrieval, and the Gemini large language model.

3

Section 03

Project Background and Motivation

Financial fraud detection has always been a core challenge for the banking industry and payment systems. Traditional rule-based systems struggle to cope with increasingly complex fraud methods, while pure machine learning models lack interpretability, making it difficult for risk control personnel to understand "why this transaction was marked as fraudulent". The FraudShield project was born from a three-day hackathon, with the goal not only to detect fraud but also to build an intelligent system that can understand context and provide clear explanations.

4

Section 04

System Architecture: Three-Layer Intelligent Integration

The core innovation of FraudShield lies in the organic integration of three different artificial intelligence technologies to form an end-to-end real-time detection pipeline.

5

Section 05

Layer 1: Anomaly Detection Engine (Isolation Forest)

When transaction data (amount, time, merchant, category, geographic location) enters the backend /api/analyze endpoint, it is first screened by the Isolation Forest model. This model is trained on historical data of normal and fraudulent behaviors and outputs a risk score between 0.00 and 1.00. The advantage of Isolation Forest is its ability to efficiently handle high-dimensional data, making it particularly suitable for detecting abnormal transaction patterns.

6

Section 06

Layer 2: Context Enhancement (RAG Retrieval)

The system maintains a local FAISS vector index containing 20 known financial attacker profiles. When a transaction is marked as high-risk, the RAG module cross-references the transaction parameters with these profiles to identify specific attack methods—such as "Velocity Testing" or "Dark Web Credentials". This retrieval-augmented approach allows the system to not only say "this transaction is suspicious" but also point out "this matches a known attack pattern".

7

Section 07

Layer 3: Insight Synthesis (Gemini LLM)

Google Gemini 1.5 Flash receives the output results from the first two layers, combines them with the retrieved attacker profile context, and generates high-confidence JSON results. The final output includes risk labels (FRAUD, REVIEW, LEGIT) and recommended actions. The introduction of the large language model enables the system to provide clear, actionable explanations in natural language.

8

Section 08

Real-Time Notification Mechanism

Detection results are persistently stored in a PostgreSQL database and simultaneously pushed to all connected React frontend clients via WebSocket. This design ensures that risk control personnel receive alerts immediately when fraud occurs, rather than waiting for batch processing tasks to complete.