Section 01
[Introduction] Exploration of RL Tuning for Multi-Hop Reasoning in Limited-Memory Models
The Multi-Hop-Reasoning project explores enhancing the performance of limited-memory language models on multi-hop compositional reasoning tasks via reinforcement learning tuning (RL-tuning), providing a feasible path for complex reasoning in resource-constrained scenarios. This research focuses on uncovering the reasoning potential of small models under constraints, and has both engineering practicality and research value.