Zing Forum

Reading

FinAgent-8B: A QLoRA-Fine-Tuned Agent Model for Real-Time Financial Reasoning

FinAgent-8B demonstrates how a 7B-parameter model can achieve performance close to large models in the financial domain through QLoRA fine-tuning and the ReAct agent architecture, including a complete workflow of data synthesis, training, and evaluation.

FinAgent-8B金融智能体QLoRA微调ReActMistral领域特化模型Alpha Vantage工具调用
Published 2026-05-11 20:35Recent activity 2026-05-11 21:24Estimated read 8 min
FinAgent-8B: A QLoRA-Fine-Tuned Agent Model for Real-Time Financial Reasoning
1

Section 01

Core Introduction to the FinAgent-8B Project

FinAgent-8B is an end-to-end agent project for real-time financial reasoning. Through QLoRA fine-tuning and the ReAct architecture, it enables an open-source 7B-parameter model (based on Mistral) to achieve performance close to large models in the financial domain. The project includes four core modules: data synthesis, QLoRA fine-tuning, ReAct agent implementation, and evaluation framework, providing a reproducible complete example for financial AI application development. Its core value lies in: properly fine-tuned small models can rival large models, reducing deployment costs, and offering local operation solutions for enterprises sensitive to data privacy.

2

Section 02

Project Background and Motivation

The financial domain has special requirements for AI models: high-precision reasoning is needed, while deployment costs and data privacy must also be considered. The core proposition of the FinAgent-8B project is: small models (e.g., 7B parameters) that have undergone domain-specific fine-tuning can rival general large models far larger than themselves in focused scenarios. This proposition provides a feasible path to solve cost and privacy issues in financial AI applications, promoting the implementation of small models in vertical domains.

3

Section 03

Data Synthesis Pipeline: Construction of High-Quality Training Samples

Data quality is key to successful fine-tuning. The project uses the Distilabel framework to build a data pipeline, generating approximately 2400 training samples with GPT-4o as the teacher model. These samples are split into training and validation sets at an 80/20 ratio and stored in Mistral's dialogue format. The sample types include:

  1. CoT reasoning samples: requiring the model to demonstrate a chain of thought, decompose complex problems, and provide structured answers;
  2. Tool call trajectories: simulating multi-turn interactions (assistant → tool → assistant), including valid [TOOL_CALLS] format and integration of tool returns;
  3. Safety guardrail examples: training the model to identify and guide inappropriate requests (such as concentration risks, unrealistic return expectations, etc.).
4

Section 04

Efficient QLoRA Fine-Tuning: Technology Selection and Configuration

The project uses QLoRA technology for efficient fine-tuning, which takes about 45 minutes to complete training on a single L40S GPU.

  • Base Model Selection: Mistral-7B-Instruct-v0.3 (natively supports parallel tool calls, adapting to the need for simultaneous multi-tool calls in financial scenarios);
  • Core Configuration: 4-bit NF4 quantization, LoRA r=16/α=32, target layers covering q/k/v/o projections and gate/up/down projection layers, bf16 mixed precision, effective batch size of 16, cosine scheduling learning rate (peak at 2e-4), training for approximately 3 epochs;
  • Model Release: The fine-tuned model has been uploaded to the Hugging Face Hub (danab17/finagent-7b-merged).
5

Section 05

ReAct Agent Implementation: From Zero to Production-Grade

The project provides two agent implementations:

  1. Zero-to-One Implementation: A handwritten ReAct loop that demonstrates the essential structure of the agent (generate thoughts and actions → parse tool calls → execute tools → integrate observation results), suitable for learning and understanding;
  2. LangGraph Version: A production-grade implementation that supports state machine flow control, conditional branching, streaming output, and human intervention, sharing the tool registry with the zero-to-one version to ensure consistent behavior. The agent integrates 7 Alpha Vantage financial tools (real-time stock prices, fundamentals, financial statements, etc.) and implements a 60-minute TTL file cache to avoid exhausting free API quotas.
6

Section 06

Evaluation Framework: Multi-Dimensional Validation of Model Performance

The project designs 20 test questions covering 5 scenarios: single_tool (correct tool selection), parallel_tools (batch parallel calls), multi_turn (tool call ordering), cot_only (direct reasoning), and guardrail (safety protection). Evaluation metrics include tool recall/precision, exact set matching, parameter JSON validity, and safety guardrail pass rate, with optional GPT-4o-mini scoring. The framework supports two modes:

  • Mock Mode: No GPU/API key required, used for CI logic validation;
  • GPU Mode: Runs a complete evaluation with the real model.
7

Section 07

Project Highlights and Application Insights

Key insights from FinAgent-8B:

  1. Synthetic data + fine-tuning can reduce model size, allowing 7B models to challenge large models;
  2. A complete closed loop (data → training → evaluation) is the core competitiveness;
  3. Parallel tool calls are a key capability for financial scenarios;
  4. Financial safety requires specialized training—base model alignment is not sufficient. The project provides quick start commands (clone, install, configure .env, run tests/agent/eval) and a Gradio interactive demo, offering a blueprint for domain agent development.