Section 01
FinReason Project Overview: Boosting Small Model Financial Numerical Reasoning with Verifiable Reward RL
FinReason is an innovative project that trains the Qwen2.5-1.5B small model (1.5B parameters) to accurately answer financial statement numerical questions. It uses a two-stage approach: Supervised Fine-Tuning (SFT) combined with Group Relative Policy Optimization (GRPO) reinforcement learning, with verifiable numerical correctness as the reward signal. The project aims to address the challenges of large models (hallucinations, high deployment cost) by enabling small models to achieve professional-level performance in specific financial tasks, while being hardware-friendly for resource-constrained environments.