Section 01
[Introduction] Chaotic Intern Env: A Benchmark Framework for AI Agents in Chaotic Workplace Environments
This article introduces the chaotic-intern-env project, an OpenEnv environment for evaluating AI agents' performance in ambiguous and contradictory workplace workflows. The project fills the gap in existing AI agent benchmarks that are overly idealized. By simulating chaotic scenarios in tech startups, it uses three progressive tasks to test agents' information filtering, conflict resolution, and decision-making abilities. It adopts a deterministic scoring mechanism, providing an evaluation basis for AI agents to move from 'toy demonstrations' to 'production tools'.