Section 01
CauliBench: Testing LLM's Instruction Following and Reasoning Stability with 'Cauliflower' (Introduction)
CauliBench is an open-source benchmark tool developed and maintained by CookieShualon (Source: GitHub, Link: https://github.com/CookieShualon/caulibench, Release Date: 2026-06-12). Wrapped in a humorous 'cauliflower' theme with serious technical goals, it tests large language models' instruction following ability, reasoning stability, and context retention through designed conflicting instructions. The project emphasizes reproducibility and LLM evaluation mechanisms, providing references for model selection, improvement feedback, and behavioral research.