Zing Forum

Reading

RSAT: Reinforcement Learning-based Table Reasoning and Fine-grained Citation Generation for Small Language Models

An in-depth analysis of the RSAT project, exploring how to train small language models to achieve faithful and reliable table reasoning and generate cell-level precise citations through a combination of SFT and GRPO reinforcement learning methods.

表格推理强化学习GRPO小型语言模型细粒度引用可解释AISFT
Published 2026-05-10 01:23Recent activity 2026-05-10 01:54Estimated read 7 min
RSAT: Reinforcement Learning-based Table Reasoning and Fine-grained Citation Generation for Small Language Models
1

Section 01

[Introduction] Core Highlights of the RSAT Project: Small Models + Reinforcement Learning for Interpretable Table Reasoning

The RSAT (Reasoning with Small models on Tables) project focuses on enabling small language models (e.g., 7B parameter scale) to achieve high-quality table reasoning and generate cell-level fine-grained citations. Its core innovation lies in adopting a training strategy that combines Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) reinforcement learning, balancing reasoning faithfulness, citation accuracy, and model efficiency, thus providing solutions for interpretable AI applications in high-risk scenarios such as finance and healthcare.

2

Section 02

Research Background and Problem Definition

Table data is an important carrier of structured information, but table reasoning faces challenges such as understanding cell content, row-column relationships, numerical calculations, and providing credible conclusions. Addressing this, the RSAT project aims to enable small language models to achieve high-quality table reasoning and provide fine-grained cell citation evidence to meet the interpretability requirements of high-risk scenarios like finance, healthcare, and law.

3

Section 03

Technical Architecture: Collaborative Training Strategy of SFT and GRPO

RSAT adopts a two-stage training approach:

  1. SFT Stage: Using high-quality datasets containing question-table-answer triples and cell citation annotations to enable the model to learn basic table understanding and citation generation patterns;
  2. GRPO Stage: Through Group Relative Policy Optimization (without requiring an additional value model), design reward functions to optimize answer correctness and citation accuracy—providing positive feedback for accurate citations and penalties for hallucinations or omissions.
4

Section 04

Cell-level Fine-grained Citation Mechanism

A distinctive feature of RSAT is cell-level citation—answers generated by the model are accompanied by cited cell coordinates (e.g., row X, column Y). This relies on a special output format design to ensure conclusions have clear data sources. This mechanism enhances verifiability: users can quickly check the correctness of the model's reasoning, lowering the trust barrier for AI applications.

5

Section 05

Efficiency Advantages of Small Models

RSAT uses small models at the 7B parameter scale, which have low inference costs and can be deployed in resource-constrained environments; the trained small models perform excellently in table reasoning benchmark tests in terms of faithfulness and citation accuracy—even comparable to large models; GRPO reinforcement learning is more stable and efficient than PPO, with low training costs, making it easy for academic researchers and small teams to reproduce and improve.

6

Section 06

Application Scenarios and Potential Impact

RSAT can be applied in scenarios such as financial analysis (assisting in extracting and verifying financial report indicators), scientific research (extracting insights from experimental data tables), and enterprise management (intelligent Q&A with data source display). The fine-grained citation mechanism supports human-machine collaboration: AI provides analysis and citation evidence, while human experts review and verify—balancing efficiency and decision reliability.

7

Section 07

Limitations and Future Directions

Current limitations of RSAT: insufficient support for complex nested tables and cross-table relational reasoning; mainly targeted at English scenarios. Future directions: combining visual models to process scanned table images, expanding multi-turn conversational table exploration, applying the citation mechanism to broader reasoning tasks, and supporting multilingual table reasoning.

8

Section 08

Summary

The RSAT project provides an innovative and practical solution for the table reasoning field through collaborative training of SFT and GRPO. It enables small models to achieve high-quality table understanding and fine-grained citations, balancing performance and efficiency, and provides a reference for the inclusive application of AI. As the importance of structured data increases, the interpretable table reasoning technology represented by RSAT has broad application prospects.