ChartQA#

Overview#

ChartQA is a benchmark designed to evaluate question-answering capabilities over charts and data visualizations. It tests both visual reasoning and logical understanding of various chart types including bar charts, line graphs, and pie charts.

Task Description#

  • Task Type: Chart Question Answering

  • Input: Chart image + natural language question

  • Output: Single word or numerical answer

  • Domains: Data visualization, visual reasoning, numerical reasoning

Key Features#

  • Covers diverse chart types (bar, line, pie, scatter plots)

  • Includes both human-written and augmented test questions

  • Requires understanding of chart structure and data relationships

  • Tests both visual extraction and logical reasoning abilities

  • Questions range from simple data lookup to complex reasoning

Evaluation Notes#

  • Default evaluation uses the test split with two subsets:

    • human_test: Human-written questions

    • augmented_test: Automatically generated questions

  • Primary metric: Relaxed Accuracy (allows minor variations in answers)

  • Answers should be in format “ANSWER: [ANSWER]”

  • Numerical answers may have tolerance for rounding differences

Properties#

Property

Value

Benchmark Name

chartqa

Dataset ID

lmms-lab/ChartQA

Paper

N/A

Tags

Knowledge, MultiModal, QA

Metrics

relaxed_acc

Default Shots

0-shot

Evaluation Split

test

Data Statistics#

Metric

Value

Total Samples

2,500

Prompt Length (Mean)

224.33 chars

Prompt Length (Min/Max)

178 / 352 chars

Per-Subset Statistics:

Subset

Samples

Prompt Mean

Prompt Min

Prompt Max

human_test

1,250

220.17

178

352

augmented_test

1,250

228.49

186

293

Image Statistics:

Metric

Value

Total Images

2,500

Images per Sample

min: 1, max: 1, mean: 1

Resolution Range

184x326 - 800x1796

Formats

png

Sample Example#

Subset: human_test

{
  "input": [
    {
      "id": "c75439e0",
      "content": [
        {
          "text": "\nHow many food item is shown in the bar graph?\n\nThe last line of your response should be of the form \"ANSWER: [ANSWER]\" (without quotes) where [ANSWER] is the a single word answer or number to the problem.\n"
        },
        {
          "image": "[BASE64_IMAGE: png, ~42.9KB]"
        }
      ]
    }
  ],
  "target": "14",
  "id": 0,
  "group_id": 0,
  "subset_key": "human_test"
}

Prompt Template#

Prompt Template:

{question}

The last line of your response should be of the form "ANSWER: [ANSWER]" (without quotes) where [ANSWER] is the a single word answer or number to the problem.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets chartqa \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['chartqa'],
    dataset_args={
        'chartqa': {
            # subset_list: ['human_test', 'augmented_test']  # optional, evaluate specific subsets
        }
    },
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)