Zing Forum

Reading

BloomBench: A Bilingual Multimodal VLM Evaluation Benchmark Based on Bloom's Taxonomy

BloomBench is a cognition-driven bilingual (English-Arabic) multimodal benchmark that organizes tasks according to Bloom's Revised Taxonomy, evaluating the multimodal reasoning capabilities of vision-language models (VLMs) across six cognitive levels from Remember to Create.

VLM基准测试多模态认知评估布鲁姆分类法双语阿拉伯语评测
Published 2026-03-30 07:46Recent activity 2026-03-30 07:58Estimated read 3 min
BloomBench: A Bilingual Multimodal VLM Evaluation Benchmark Based on Bloom's Taxonomy
1

Section 01

Introduction / Main Post: BloomBench: A Bilingual Multimodal VLM Evaluation Benchmark Based on Bloom's Taxonomy

BloomBench is a cognition-driven bilingual (English-Arabic) multimodal benchmark that organizes tasks according to Bloom's Revised Taxonomy, evaluating the multimodal reasoning capabilities of vision-language models (VLMs) across six cognitive levels from Remember to Create.

2

Section 02

Project Background and Motivation

Most existing vision-language model (VLM) benchmarks focus on accuracy or headline scores for isolated tasks, making it difficult to reveal the true cognitive-level capability distribution of models. BloomBench is designed to change this situation—it systematically evaluates the multimodal reasoning capabilities of VLMs from a cognitive science perspective, based on Bloom's Revised Taxonomy.

Core Design Principles:

  • Diagnostic Cognitive Profiling: Not just "what can it do", but "how well does it perform at each cognitive level"
  • Cross-Language Stress Testing: Parallel bilingual support for English and Arabic, going beyond Anglocentrism
  • Balance Between Scalability and High Quality: Semi-automated construction process + hybrid validation mechanism
3

Section 03

Six Cognitive Levels of Bloom's Taxonomy

BloomBench organizes evaluation tasks according to the six levels of Bloom's Taxonomy:

4

Section 04

1. Remember

Capabilities at the recognition and recall level:

  • Object recognition in images
  • Attribute memory (color, shape, material)
  • Activity recognition
  • Symbol and text recognition
5

Section 05

2. Understand

Comprehension of combinations and relationships:

  • Semantic relationship understanding
  • Emotion understanding
  • Paraphrase style understanding
  • Vision-language alignment
6

Section 06

3. Apply

Applying knowledge in new visual contexts:

  • Multimodal logic (negation, structure)
  • Rule application
  • Context transfer
7

Section 07

4. Analyze

Decomposition and reasoning:

  • Logical/scientific reasoning
  • Context analysis
  • Chart/table interpretation
  • Atypical attribute analysis
8

Section 08

5. Evaluate

Judgment and decision-making:

  • Consistency/hallucination detection
  • Harmfulness and safety assessment
  • Quality evaluation