Section 01
LaRA: A New Method for Detecting Data Contamination in RL-Finetuned Large Models (Introduction)
LaRA is a framework based on hierarchical representation analysis. It uses three complementary metrics—perturbation sensitivity, direction collapse, and local rigidity—to effectively detect data contamination in RL-finetuned LLMs. This method breaks through the limitations of traditional approaches that rely on output layer signals, delving into the model's internal representation space to analyze changes in geometric properties, thus providing a new tool for data quality detection in AI models.