# FinAgent-8B: A QLoRA-Fine-Tuned Agent Model for Real-Time Financial Reasoning

> FinAgent-8B demonstrates how a 7B-parameter model can achieve performance close to large models in the financial domain through QLoRA fine-tuning and the ReAct agent architecture, including a complete workflow of data synthesis, training, and evaluation.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-11T12:35:24.000Z
- 最近活动: 2026-05-11T13:24:30.021Z
- 热度: 150.2
- 关键词: FinAgent-8B, 金融智能体, QLoRA微调, ReAct, Mistral, 领域特化模型, Alpha Vantage, 工具调用
- 页面链接: https://www.zingnex.cn/en/forum/thread/finagent-8b-qlora
- Canonical: https://www.zingnex.cn/forum/thread/finagent-8b-qlora
- Markdown 来源: floors_fallback

---

## Core Introduction to the FinAgent-8B Project

FinAgent-8B is an end-to-end agent project for real-time financial reasoning. Through QLoRA fine-tuning and the ReAct architecture, it enables an open-source 7B-parameter model (based on Mistral) to achieve performance close to large models in the financial domain. The project includes four core modules: data synthesis, QLoRA fine-tuning, ReAct agent implementation, and evaluation framework, providing a reproducible complete example for financial AI application development. Its core value lies in: properly fine-tuned small models can rival large models, reducing deployment costs, and offering local operation solutions for enterprises sensitive to data privacy.

## Project Background and Motivation

The financial domain has special requirements for AI models: high-precision reasoning is needed, while deployment costs and data privacy must also be considered. The core proposition of the FinAgent-8B project is: small models (e.g., 7B parameters) that have undergone domain-specific fine-tuning can rival general large models far larger than themselves in focused scenarios. This proposition provides a feasible path to solve cost and privacy issues in financial AI applications, promoting the implementation of small models in vertical domains.

## Data Synthesis Pipeline: Construction of High-Quality Training Samples

Data quality is key to successful fine-tuning. The project uses the Distilabel framework to build a data pipeline, generating approximately 2400 training samples with GPT-4o as the teacher model. These samples are split into training and validation sets at an 80/20 ratio and stored in Mistral's dialogue format. The sample types include: 
1. CoT reasoning samples: requiring the model to demonstrate a chain of thought, decompose complex problems, and provide structured answers;
2. Tool call trajectories: simulating multi-turn interactions (assistant → tool → assistant), including valid `[TOOL_CALLS]` format and integration of tool returns;
3. Safety guardrail examples: training the model to identify and guide inappropriate requests (such as concentration risks, unrealistic return expectations, etc.).

## Efficient QLoRA Fine-Tuning: Technology Selection and Configuration

The project uses QLoRA technology for efficient fine-tuning, which takes about 45 minutes to complete training on a single L40S GPU.
- **Base Model Selection**: Mistral-7B-Instruct-v0.3 (natively supports parallel tool calls, adapting to the need for simultaneous multi-tool calls in financial scenarios);
- **Core Configuration**: 4-bit NF4 quantization, LoRA r=16/α=32, target layers covering q/k/v/o projections and gate/up/down projection layers, bf16 mixed precision, effective batch size of 16, cosine scheduling learning rate (peak at 2e-4), training for approximately 3 epochs;
- **Model Release**: The fine-tuned model has been uploaded to the Hugging Face Hub (`danab17/finagent-7b-merged`).

## ReAct Agent Implementation: From Zero to Production-Grade

The project provides two agent implementations:
1. **Zero-to-One Implementation**: A handwritten ReAct loop that demonstrates the essential structure of the agent (generate thoughts and actions → parse tool calls → execute tools → integrate observation results), suitable for learning and understanding;
2. **LangGraph Version**: A production-grade implementation that supports state machine flow control, conditional branching, streaming output, and human intervention, sharing the tool registry with the zero-to-one version to ensure consistent behavior.
The agent integrates 7 Alpha Vantage financial tools (real-time stock prices, fundamentals, financial statements, etc.) and implements a 60-minute TTL file cache to avoid exhausting free API quotas.

## Evaluation Framework: Multi-Dimensional Validation of Model Performance

The project designs 20 test questions covering 5 scenarios: single_tool (correct tool selection), parallel_tools (batch parallel calls), multi_turn (tool call ordering), cot_only (direct reasoning), and guardrail (safety protection). Evaluation metrics include tool recall/precision, exact set matching, parameter JSON validity, and safety guardrail pass rate, with optional GPT-4o-mini scoring. The framework supports two modes:
- **Mock Mode**: No GPU/API key required, used for CI logic validation;
- **GPU Mode**: Runs a complete evaluation with the real model.

## Project Highlights and Application Insights

Key insights from FinAgent-8B:
1. Synthetic data + fine-tuning can reduce model size, allowing 7B models to challenge large models;
2. A complete closed loop (data → training → evaluation) is the core competitiveness;
3. Parallel tool calls are a key capability for financial scenarios;
4. Financial safety requires specialized training—base model alignment is not sufficient.
The project provides quick start commands (clone, install, configure .env, run tests/agent/eval) and a Gradio interactive demo, offering a blueprint for domain agent development.