Zing Forum

Reading

PolyChartQA: A Multilingual Chart Question Answering Benchmark Dataset

PolyChartQA is a benchmark dataset specifically designed for multilingual chart question answering tasks, used to evaluate the ability of large vision-language models (LVLMs) to understand and answer chart-related questions, supporting multiple language scenarios.

PolyChartQA图表问答视觉语言模型多语言基准测试LVLM数据可视化理解
Published 2026-04-19 11:57Recent activity 2026-04-19 12:23Estimated read 8 min
PolyChartQA: A Multilingual Chart Question Answering Benchmark Dataset
1

Section 01

Introduction: Core Overview of PolyChartQA—A Multilingual Chart QA Benchmark Dataset

PolyChartQA is a benchmark dataset designed for multilingual chart question answering tasks, aiming to evaluate the chart understanding ability of Large Vision-Language Models (LVLMs). It addresses issues such as insufficient multilingual support and single chart type in existing benchmarks. Its core contributions include multilingual coverage, diverse chart types, multi-level question design, and standardized evaluation protocols, providing key tools for academic research and practical applications.

2

Section 02

Background: Shortcomings of LVLMs in Chart Understanding and the Birth of PolyChartQA

Challenges of Vision-Language Models in Chart Understanding

Large vision-language models (e.g., GPT-4V, Claude3) have obvious shortcomings in chart understanding: numerical extraction errors, trend judgment mistakes, insufficient multilingual support, and difficulty in complex reasoning, which limit their application in global scenarios.

Purpose of PolyChartQA's Creation

To fill the gap, PolyChartQA provides a multilingual chart question answering benchmark with core contributions:

  • Multilingual coverage (Chinese, Japanese, German, etc.)
  • Diverse chart types (bar charts, line charts, pie charts, etc.)
  • Multi-level question design (from data extraction to complex reasoning)
  • Standardized evaluation protocols

It helps evaluate and improve the chart understanding ability of LVLMs.

3

Section 03

Dataset Composition: Multi-dimensional Design Features

Diversity of Chart Types

Includes bar charts, line charts, pie charts, scatter plots, and combined charts, covering different data visualization scenarios.

Hierarchical Question Types

Follows the principle of from shallow to deep:

  • Level1: Data Extraction
  • Level2: Simple Comparison
  • Level3: Trend Analysis
  • Level4: Mathematical Operations
  • Level5: Complex Reasoning

Multilingual Support

Covers Chinese, Japanese, German, etc., to evaluate the cross-language transfer ability of models and their systematic language weaknesses.

4

Section 04

Evaluation Framework: Standardized Metrics and Fine-grained Analysis

Accuracy Metrics

  • Exact Match: Answers are completely consistent
  • Semantic Equivalence: Different expressions but same meaning
  • Numerical Tolerance: Allows ±1% error

Fine-grained Analysis

  • Analyze performance by chart type, question difficulty, and language
  • Classify error types (numerical misreading, trend misjudgment, etc.)

Comparison Benchmarks

Includes test results of mainstream models such as GPT-4V, Claude3, and Gemini Pro Vision, facilitating comparisons between models.

5

Section 05

Value and Applications: Dual Significance in Academic and Practical Scenarios

Academic Research Value

  • Model Capability Diagnosis: Fine-grained error analysis
  • Cross-model Comparison: Standardized protocols support direct comparison
  • Multilingual Research: Provides cross-language transfer data
  • Training Data: Used for supervised learning or reinforcement learning

Practical Application Scenarios

  • Business Intelligence: Automatically extract insights from reports
  • Financial Analysis: Assist in interpreting financial report charts
  • Educational Assistance: Help students understand charts
  • Accessibility Services: Describe charts for visually impaired users
  • Content Audit: Check consistency between charts and text
6

Section 06

Usage Guide and Expansion Directions

Quick Start

  1. Data Download: Obtain chart images and annotations via scripts
  2. Environment Configuration: Install dependencies and configure model APIs
  3. Inference Execution: Use sample code to run model inference
  4. Result Evaluation: Generate detailed performance reports

Custom Expansion

  • Add New Languages: Translate the dataset into more languages
  • Add New Chart Types: Add radar charts, tree maps, etc.
  • Add New Question Types: Design complex reasoning questions
  • Adversarial Samples: Generate misleading charts to test robustness
7

Section 07

Limitations and Future Outlook

Limitations

  • Static Charts: Only includes static images, no interactive/dynamic charts
  • Synthetic Data: Some charts are program-generated, differing from real-world scenarios
  • Definite Answers: No open-ended or subjective questions

Future Directions

  • Introduce real-world chart data
  • Support interactive chart understanding
  • Add open-ended question types
  • Analyze charts in combination with document context
  • Support chart analysis in videos
8

Section 08

Summary: Core Significance of PolyChartQA

PolyChartQA provides a comprehensive, multilingual, multi-level evaluation platform for the chart understanding ability of LVLMs, helping researchers diagnose model boundaries and point the way for developing stronger chart understanding systems. With the popularization of data visualization and the globalization of AI, chart understanding will become a core competency of LVLMs, and PolyChartQA has made significant contributions in this field.