# RSAT: Reinforcement Learning-based Table Reasoning and Fine-grained Citation Generation for Small Language Models

> An in-depth analysis of the RSAT project, exploring how to train small language models to achieve faithful and reliable table reasoning and generate cell-level precise citations through a combination of SFT and GRPO reinforcement learning methods.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-09T17:23:39.000Z
- 最近活动: 2026-05-09T17:54:00.366Z
- 热度: 157.5
- 关键词: 表格推理, 强化学习, GRPO, 小型语言模型, 细粒度引用, 可解释AI, SFT
- 页面链接: https://www.zingnex.cn/en/forum/thread/rsat
- Canonical: https://www.zingnex.cn/forum/thread/rsat
- Markdown 来源: floors_fallback

---

## [Introduction] Core Highlights of the RSAT Project: Small Models + Reinforcement Learning for Interpretable Table Reasoning

The RSAT (Reasoning with Small models on Tables) project focuses on enabling small language models (e.g., 7B parameter scale) to achieve high-quality table reasoning and generate cell-level fine-grained citations. Its core innovation lies in adopting a training strategy that combines Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) reinforcement learning, balancing reasoning faithfulness, citation accuracy, and model efficiency, thus providing solutions for interpretable AI applications in high-risk scenarios such as finance and healthcare.

## Research Background and Problem Definition

Table data is an important carrier of structured information, but table reasoning faces challenges such as understanding cell content, row-column relationships, numerical calculations, and providing credible conclusions. Addressing this, the RSAT project aims to enable small language models to achieve high-quality table reasoning and provide fine-grained cell citation evidence to meet the interpretability requirements of high-risk scenarios like finance, healthcare, and law.

## Technical Architecture: Collaborative Training Strategy of SFT and GRPO

RSAT adopts a two-stage training approach: 
1. SFT Stage: Using high-quality datasets containing question-table-answer triples and cell citation annotations to enable the model to learn basic table understanding and citation generation patterns; 
2. GRPO Stage: Through Group Relative Policy Optimization (without requiring an additional value model), design reward functions to optimize answer correctness and citation accuracy—providing positive feedback for accurate citations and penalties for hallucinations or omissions.

## Cell-level Fine-grained Citation Mechanism

A distinctive feature of RSAT is cell-level citation—answers generated by the model are accompanied by cited cell coordinates (e.g., row X, column Y). This relies on a special output format design to ensure conclusions have clear data sources. This mechanism enhances verifiability: users can quickly check the correctness of the model's reasoning, lowering the trust barrier for AI applications.

## Efficiency Advantages of Small Models

RSAT uses small models at the 7B parameter scale, which have low inference costs and can be deployed in resource-constrained environments; the trained small models perform excellently in table reasoning benchmark tests in terms of faithfulness and citation accuracy—even comparable to large models; GRPO reinforcement learning is more stable and efficient than PPO, with low training costs, making it easy for academic researchers and small teams to reproduce and improve.

## Application Scenarios and Potential Impact

RSAT can be applied in scenarios such as financial analysis (assisting in extracting and verifying financial report indicators), scientific research (extracting insights from experimental data tables), and enterprise management (intelligent Q&A with data source display). The fine-grained citation mechanism supports human-machine collaboration: AI provides analysis and citation evidence, while human experts review and verify—balancing efficiency and decision reliability.

## Limitations and Future Directions

Current limitations of RSAT: insufficient support for complex nested tables and cross-table relational reasoning; mainly targeted at English scenarios. Future directions: combining visual models to process scanned table images, expanding multi-turn conversational table exploration, applying the citation mechanism to broader reasoning tasks, and supporting multilingual table reasoning.

## Summary

The RSAT project provides an innovative and practical solution for the table reasoning field through collaborative training of SFT and GRPO. It enables small models to achieve high-quality table understanding and fine-grained citations, balancing performance and efficiency, and provides a reference for the inclusive application of AI. As the importance of structured data increases, the interpretable table reasoning technology represented by RSAT has broad application prospects.