Reading

IDEAFix: An Evaluation Framework for Prompt Strategies to Break Cognitive Fixation in Large Language Models

This article introduces the IDEAFix evaluation framework, explores the problem of cognitive fixation in large language models (LLMs) during creative generation tasks, and discusses methods to stimulate innovative thinking in models through systematic prompt strategies.

大语言模型创意生成去固化提示工程评估框架认知固化创新思维

Published 2026-04-29 22:50Recent activity 2026-04-29 22:56Estimated read 6 min

IDEAFix: An Evaluation Framework for Prompt Strategies to Break Cognitive Fixation in Large Language Models

Section 01

IDEAFix Framework Overview: Evaluating LLMs' Ability to Break Cognitive Fixation in Creativity

IDEAFix is a systematic evaluation framework addressing the problem of cognitive fixation in creative generation by large language models. This article introduces the framework's design goals, core components (dataset, evaluation dimensions, de-fixation strategies), experimental results, and application value, aiming to provide a standardized benchmark for evaluating and enhancing AI's creative capabilities.

Section 02

Research Background: The Challenge of Cognitive Fixation in LLMs

Large language models excel in text generation tasks, but face cognitive fixation in creative generation scenarios: they tend to produce conventional, patterned responses, similar to human functional fixedness. The IDEAFix project aims to provide a standardized evaluation benchmark for LLMs' ability to break cognitive fixation in creativity, promoting the development of innovative AI generation technologies.

Section 03

IDEAFix Framework Design: Dataset, Evaluation Dimensions, and De-Fixation Strategies

Dataset Scale and Structure

IDEAFix constructs a large-scale evaluation dataset containing 14,350 prompt samples, 567 creative briefs, 81 categories, and multi-dimensional annotations.

Evaluation Dimensions

Evaluation is conducted across four dimensions: Originality (difference from conventional solutions), Fluency (quantity and speed), Flexibility (cross-domain capability), and Elaboration (detail richness).

Prompt Strategy Classification

Various de-fixation strategies are evaluated: SCAMPER method, TRIZ principles, analogical thinking, attribute listing, counterfactual thinking, and random stimulation.

Section 04

Experimental Design: Model Coverage and Evaluation Process

Model Coverage

Mainstream LLMs are evaluated: GPT-4o, Claude series, Gemini-2.5-Flash, Llama-3.1-70B, Qwen3-30B, Grok-4.1-Fast-Reasoning.

Evaluation Process

Baseline test: Obtain default creative performance using standard prompts
Strategy test: Apply specific de-fixation strategies
Comparative analysis: Quantify indicator improvements
Cross-model comparison: Analyze differences in strategy sensitivity

Evaluation Methods

Combines automatic metrics (semantic similarity, diversity), human evaluation (expert subjective scoring), and LLM evaluation (exploring feasibility).

Section 05

Research Findings: Strategy Effectiveness and Insights into Model Creative Capabilities

Differences in Strategy Effectiveness: Different strategies have significant differences in model performance; prompt engineering needs to be customized for specific models.
Model Scale and Creativity: There is a non-linear relationship between scale and creative performance; medium-scale models may outperform large-scale models in specific tasks.
Domain Specificity: Structured methods (e.g., TRIZ) are suitable for technical tasks, while random stimulation strategies are suitable for artistic tasks.

Section 06

Application Scenarios: Practical Value in Creative Industry, Education, and AI R&D

Creative Industry

Helps select appropriate models, optimize prompt strategies, and establish quantitative standards for creative quality.

Education Sector

Serves as a teaching tool to understand creative thinking methods, evaluate creative performance, and compare the effects of traditional and AI-enhanced methods.

AI R&D

Diagnose model weaknesses, guide specialized training, and track the impact of version iterations on creative capabilities.

Section 07

Limitations and Future Directions: Challenges and Expansion

Current Limitations

Subjectivity challenge: Large subjective differences in creative evaluation
Cultural bias: Dataset and standards imply specific cultural backgrounds
Insufficient dynamism: Need continuous updates to adapt to model iterations

Future Directions

Multimodal creativity: Expand to images, videos, etc.
Real-time interaction: Conversational creative generation strategies
Personalized creativity: Customize strategies based on user preferences
Creative collaboration: Evaluate the effect of human-AI collaboration