ChartQA#
Overview#
ChartQA is a benchmark designed to evaluate question-answering capabilities over charts and data visualizations. It tests both visual reasoning and logical understanding of various chart types including bar charts, line graphs, and pie charts.
Task Description#
Task Type: Chart Question Answering
Input: Chart image + natural language question
Output: Single word or numerical answer
Domains: Data visualization, visual reasoning, numerical reasoning
Key Features#
Covers diverse chart types (bar, line, pie, scatter plots)
Includes both human-written and augmented test questions
Requires understanding of chart structure and data relationships
Tests both visual extraction and logical reasoning abilities
Questions range from simple data lookup to complex reasoning
Evaluation Notes#
Default evaluation uses the test split with two subsets:
human_test: Human-written questionsaugmented_test: Automatically generated questions
Primary metric: Relaxed Accuracy (allows minor variations in answers)
Answers should be in format “ANSWER: [ANSWER]”
Numerical answers may have tolerance for rounding differences
Properties#
Property |
Value |
|---|---|
Benchmark Name |
|
Dataset ID |
|
Paper |
N/A |
Tags |
|
Metrics |
|
Default Shots |
0-shot |
Evaluation Split |
|
Data Statistics#
Metric |
Value |
|---|---|
Total Samples |
2,500 |
Prompt Length (Mean) |
224.33 chars |
Prompt Length (Min/Max) |
178 / 352 chars |
Per-Subset Statistics:
Subset |
Samples |
Prompt Mean |
Prompt Min |
Prompt Max |
|---|---|---|---|---|
|
1,250 |
220.17 |
178 |
352 |
|
1,250 |
228.49 |
186 |
293 |
Image Statistics:
Metric |
Value |
|---|---|
Total Images |
2,500 |
Images per Sample |
min: 1, max: 1, mean: 1 |
Resolution Range |
184x326 - 800x1796 |
Formats |
png |
Sample Example#
Subset: human_test
{
"input": [
{
"id": "c75439e0",
"content": [
{
"text": "\nHow many food item is shown in the bar graph?\n\nThe last line of your response should be of the form \"ANSWER: [ANSWER]\" (without quotes) where [ANSWER] is the a single word answer or number to the problem.\n"
},
{
"image": "[BASE64_IMAGE: png, ~42.9KB]"
}
]
}
],
"target": "14",
"id": 0,
"group_id": 0,
"subset_key": "human_test"
}
Prompt Template#
Prompt Template:
{question}
The last line of your response should be of the form "ANSWER: [ANSWER]" (without quotes) where [ANSWER] is the a single word answer or number to the problem.
Usage#
Using CLI#
evalscope eval \
--model YOUR_MODEL \
--api-url OPENAI_API_COMPAT_URL \
--api-key EMPTY_TOKEN \
--datasets chartqa \
--limit 10 # Remove this line for formal evaluation
Using Python#
from evalscope import run_task
from evalscope.config import TaskConfig
task_cfg = TaskConfig(
model='YOUR_MODEL',
api_url='OPENAI_API_COMPAT_URL',
api_key='EMPTY_TOKEN',
datasets=['chartqa'],
dataset_args={
'chartqa': {
# subset_list: ['human_test', 'augmented_test'] # optional, evaluate specific subsets
}
},
limit=10, # Remove this line for formal evaluation
)
run_task(task_cfg=task_cfg)