Section 01
[Introduction] Coupled Token Generation: A New Evaluation Paradigm for LLMs
A research team from the Max Planck Institute for Software Systems (MPI-SWS) proposed the "Coupled Token Generation" evaluation method, which uses a counterfactual reasoning framework to more accurately measure the true capabilities of LLMs. This study has been accepted by AISTATS 2026, and the codebase is open-sourced.