Reading

PolyChartQA: A Multilingual Chart Question Answering Benchmark Dataset

PolyChartQA is a benchmark dataset specifically designed for multilingual chart question answering tasks, used to evaluate the ability of large vision-language models (LVLMs) to understand and answer chart-related questions, supporting multiple language scenarios.

PolyChartQA图表问答视觉语言模型多语言基准测试LVLM数据可视化理解

Published 2026-04-19 11:57Recent activity 2026-04-19 12:23Estimated read 8 min

PolyChartQA: A Multilingual Chart Question Answering Benchmark Dataset

Section 01

Introduction: Core Overview of PolyChartQA—A Multilingual Chart QA Benchmark Dataset

PolyChartQA is a benchmark dataset designed for multilingual chart question answering tasks, aiming to evaluate the chart understanding ability of Large Vision-Language Models (LVLMs). It addresses issues such as insufficient multilingual support and single chart type in existing benchmarks. Its core contributions include multilingual coverage, diverse chart types, multi-level question design, and standardized evaluation protocols, providing key tools for academic research and practical applications.

Section 02

Background: Shortcomings of LVLMs in Chart Understanding and the Birth of PolyChartQA

Challenges of Vision-Language Models in Chart Understanding

Large vision-language models (e.g., GPT-4V, Claude3) have obvious shortcomings in chart understanding: numerical extraction errors, trend judgment mistakes, insufficient multilingual support, and difficulty in complex reasoning, which limit their application in global scenarios.

Purpose of PolyChartQA's Creation

To fill the gap, PolyChartQA provides a multilingual chart question answering benchmark with core contributions:

Multilingual coverage (Chinese, Japanese, German, etc.)
Diverse chart types (bar charts, line charts, pie charts, etc.)
Multi-level question design (from data extraction to complex reasoning)
Standardized evaluation protocols

It helps evaluate and improve the chart understanding ability of LVLMs.

Section 03

Dataset Composition: Multi-dimensional Design Features

Diversity of Chart Types

Includes bar charts, line charts, pie charts, scatter plots, and combined charts, covering different data visualization scenarios.

Hierarchical Question Types

Follows the principle of from shallow to deep:

Level1: Data Extraction
Level2: Simple Comparison
Level3: Trend Analysis
Level4: Mathematical Operations
Level5: Complex Reasoning

Multilingual Support

Covers Chinese, Japanese, German, etc., to evaluate the cross-language transfer ability of models and their systematic language weaknesses.

Section 04

Evaluation Framework: Standardized Metrics and Fine-grained Analysis

Accuracy Metrics

Exact Match: Answers are completely consistent
Semantic Equivalence: Different expressions but same meaning
Numerical Tolerance: Allows ±1% error

Fine-grained Analysis

Analyze performance by chart type, question difficulty, and language
Classify error types (numerical misreading, trend misjudgment, etc.)

Comparison Benchmarks

Includes test results of mainstream models such as GPT-4V, Claude3, and Gemini Pro Vision, facilitating comparisons between models.

Section 05

Value and Applications: Dual Significance in Academic and Practical Scenarios

Academic Research Value

Model Capability Diagnosis: Fine-grained error analysis
Cross-model Comparison: Standardized protocols support direct comparison
Multilingual Research: Provides cross-language transfer data
Training Data: Used for supervised learning or reinforcement learning

Practical Application Scenarios

Business Intelligence: Automatically extract insights from reports
Financial Analysis: Assist in interpreting financial report charts
Educational Assistance: Help students understand charts
Accessibility Services: Describe charts for visually impaired users
Content Audit: Check consistency between charts and text

Section 06

Usage Guide and Expansion Directions

Quick Start

Data Download: Obtain chart images and annotations via scripts
Environment Configuration: Install dependencies and configure model APIs
Inference Execution: Use sample code to run model inference
Result Evaluation: Generate detailed performance reports

Custom Expansion

Add New Languages: Translate the dataset into more languages
Add New Chart Types: Add radar charts, tree maps, etc.
Add New Question Types: Design complex reasoning questions
Adversarial Samples: Generate misleading charts to test robustness

Section 07

Limitations and Future Outlook

Limitations

Static Charts: Only includes static images, no interactive/dynamic charts
Synthetic Data: Some charts are program-generated, differing from real-world scenarios
Definite Answers: No open-ended or subjective questions

Future Directions

Introduce real-world chart data
Support interactive chart understanding
Add open-ended question types
Analyze charts in combination with document context
Support chart analysis in videos

Section 08

Summary: Core Significance of PolyChartQA

PolyChartQA provides a comprehensive, multilingual, multi-level evaluation platform for the chart understanding ability of LVLMs, helping researchers diagnose model boundaries and point the way for developing stronger chart understanding systems. With the popularization of data visualization and the globalization of AI, chart understanding will become a core competency of LVLMs, and PolyChartQA has made significant contributions in this field.