Section 01
[Main Post/Introduction] Diagnosis of Formal Reasoning Capabilities of Large Language Models: Regular Language Tests Reveal 11 Failure Modes and Intervention Framework
This study systematically evaluates the symbolic reasoning capabilities of GPT-5.2, Grok-4.1, Gemini-2.5, and Qwen2.5 series models through regular languages—a fully verifiable formal domain—identifies 11 failure modes, and proposes the VGNS (Vector-Guided Neuron Selection) intervention framework. The results provide important references for evaluating the boundaries and optimizing the formal reasoning capabilities of LLMs.