InfoVQA#

Overview#

InfoVQA (Infographic Visual Question Answering) is a benchmark designed to evaluate AI models’ ability to answer questions based on information-dense images such as charts, graphs, diagrams, maps, and infographics. It focuses on understanding complex visual information presentations.

Task Description#

Task Type: Infographic Question Answering
Input: Infographic image + natural language question
Output: Single word or phrase answer
Domains: Data visualization, information graphics, visual reasoning

Key Features#

Focuses on information-dense visual content
Covers charts, graphs, diagrams, maps, and infographics
Requires understanding visual layouts and data representations
Tests information extraction and reasoning abilities
Questions vary in complexity from direct lookup to inference

Evaluation Notes#

Default evaluation uses the validation split
Primary metric: ANLS (Average Normalized Levenshtein Similarity)
Answers should be in format “ANSWER: [ANSWER]”
Includes OCR text extraction as metadata for analysis
Uses same dataset source as DocVQA (InfographicVQA subset)

Properties#

Property	Value
Benchmark Name	`infovqa`
Dataset ID	lmms-lab/DocVQA
Paper	N/A
Tags	`Knowledge`, `MultiModal`, `QA`
Metrics	`anls`
Default Shots	0-shot
Evaluation Split	`validation`

Data Statistics#

Metric	Value
Total Samples	2,801
Prompt Length (Mean)	273.38 chars
Prompt Length (Min/Max)	222 / 390 chars

Image Statistics:

Metric	Value
Total Images	2,801
Images per Sample	min: 1, max: 1, mean: 1
Resolution Range	600x340 - 6250x9375
Formats	jpeg

Sample Example#

Subset: InfographicVQA

{
  "input": [
    {
      "id": "03ab3147",
      "content": [
        {
          "text": "Answer the question according to the image using a single word or phrase.\nWhich social platform has heavy female audience?\nThe last line of your response should be of the form \"ANSWER: [ANSWER]\" (without quotes) where [ANSWER] is the answer to the question."
        },
        {
          "image": "[BASE64_IMAGE: png, ~249.7KB]"
        }
      ]
    }
  ],
  "target": "[\"pinterest\"]",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "questionId": "98313",
    "answer_type": [
      "single span"
    ],
    "image_url": "https://blogs.constantcontact.com/wp-content/uploads/2019/03/Social-Media-Infographic.png",
    "ocr": "['{\"PAGE\": [{\"BlockType\": \"PAGE\", \"Geometry\": {\"BoundingBox\": {\"Width\": 0.9994840025901794, \"Height\": 0.9997748732566833, \"Left\": 0.0, \"Top\": 0.0}, \"Polygon\": [{\"X\": 0.0, \"Y\": 0.0}, {\"X\": 0.9994840025901794, \"Y\": 0.0}, {\"X\": 0.999484002590179 ... [TRUNCATED] ... 184143, \"Y\": 0.9778721332550049}, {\"X\": 0.5701684951782227, \"Y\": 0.9778721332550049}, {\"X\": 0.5701684951782227, \"Y\": 0.9896419048309326}, {\"X\": 0.47732439637184143, \"Y\": 0.9896419048309326}]}, \"Id\": \"43af6e92-c2ef-483c-b947-8b2d2073d756\"}]}']"
  }
}

Note: Some content was truncated for display.

Prompt Template#

Prompt Template:

Answer the question according to the image using a single word or phrase.
{question}
The last line of your response should be of the form "ANSWER: [ANSWER]" (without quotes) where [ANSWER] is the answer to the question.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets infovqa \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['infovqa'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)