Section 01
Introduction: Core Overview of the Project on Fine-tuning Llama's Reasoning Ability Using Rule-based Reinforcement Learning
This project demonstrates how to fine-tune the Llama model using Rule-based Reinforcement Learning (Rule-based RL) to follow XML format standards on the GSM8K mathematical reasoning task, and complete training and evaluation on the Leonardo supercomputer. The project also verifies the generality of the method through the CartPole-v1 benchmark test and chess self-play, providing practical references for improving model reasoning ability.