Section 01
Qwen-RBI-RL: A Domain Expert Model for RBI Regulatory Docs with Three-Stage Training
This project introduces Qwen-RBI-RL, a domain-specific model trained on India Reserve Bank (RBI) regulatory documents. It uses a three-stage training process (continuous pre-training, cold-start SFT, GRPO reinforcement learning) based on the Qwen3-4B model, achieving verifiable reasoning capabilities in the financial regulatory domain while remaining efficient for deployment on consumer hardware.