Section 01
In-depth Evaluation of OpenAI o1 Model's Planning Capabilities: Key Findings and Research Significance
The VITA research team at the University of Texas at Austin presented a study at the NeurIPS'24 LanGame workshop, systematically evaluating the feasibility, optimality, and generalization of GPT-4 and the o1 series models (o1-mini, o1-preview) in planning tasks. The study reveals: the o1 models excel in problem understanding, being able to parse complex domain definitions more accurately; however, they have obvious limitations in spatial reasoning (executing errors during multi-step reasoning) and generalization (performance degradation when symbolic representations change). This research provides empirical references for the application and subsequent research of LLM planning capabilities.