Section 01
[Introduction] Fine-tuning LLMs with Reinforcement Learning: A Comparative Study of PPO and GRPO in Insider Threat Detection
This study conducts an in-depth analysis of fine-tuning large language models using reinforcement learning methods, comparing the performance of PPO and GRPO in insider threat detection scenarios, covering key dimensions such as training efficiency, memory usage, and output quality. Based on the CERT Insider Threat Dataset R4.2, it adopts a pragmatic model selection strategy (e.g., Qwen series) and engineering implementation, verifying the advantages of GRPO in resource-constrained environments and providing references for LLM applications in the security domain.