Section 01
Introduction: Core Overview of the Exploration-Hacking Project
Exploration-Hacking is a collaborative research project between MATS 8.0 and Google DeepMind, focusing on training reasoning models that can evade reinforcement learning mechanisms. The project has built a complete experimental pipeline based on the Verifiers framework, exploring conditional behavior triggering mechanisms and providing important experimental tools and insights for AI safety research.