RealWorldQA#

Overview#

RealWorldQA is a benchmark contributed by XAI designed to evaluate multimodal AI models’ understanding of real-world spatial and physical environments. It uses authentic images from everyday scenarios to test practical visual comprehension.

Task Description#

Task Type: Real-World Visual Question Answering
Input: Real-world image with spatial/physical question
Output: Verifiable answer about the scene
Domain: Physical environments, driving scenarios, everyday scenes

Key Features#

700+ images from real-world scenarios
Includes vehicle-captured images (driving scenes)
Questions with verifiable ground-truth answers
Tests spatial understanding and physical reasoning
Evaluates practical AI understanding capabilities

Evaluation Notes#

Default configuration uses 0-shot evaluation
Answers should follow “ANSWER: [ANSWER]” format
Uses step-by-step reasoning prompting
Simple accuracy metric for evaluation
Tests models on practical, real-world scenarios

Properties#

Property	Value
Benchmark Name	`real_world_qa`
Dataset ID	lmms-lab/RealWorldQA
Paper	N/A
Tags	`Knowledge`, `MultiModal`, `QA`
Metrics	`acc`
Default Shots	0-shot
Evaluation Split	`test`

Data Statistics#

Metric	Value
Total Samples	765
Prompt Length (Mean)	554.79 chars
Prompt Length (Min/Max)	459 / 904 chars

Image Statistics:

Metric	Value
Total Images	765
Images per Sample	min: 1, max: 1, mean: 1
Resolution Range	626x418 - 1536x1405
Formats	webp

Sample Example#

Subset: default

{
  "input": [
    {
      "id": "6492d8ea",
      "content": [
        {
          "text": "Read the picture and solve the following problem step by step.The last line of your response should be of the form \"ANSWER: [ANSWER]\" (without quotes) where [ANSWER] is the answer to the problem.\n\nIn which direction is the front wheel of the  ... [TRUNCATED] ... e letter of the correct option and nothing else.\n\nRemember to put your answer on its own line at the end in the form \"ANSWER: [ANSWER]\" (without quotes) where [ANSWER] is the answer to the problem, and you do not need to use a \\boxed command."
        },
        {
          "image": "[BASE64_IMAGE: webp, ~810.4KB]"
        }
      ]
    }
  ],
  "target": "C",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "image_path": "0.webp"
  }
}

Note: Some content was truncated for display.

Prompt Template#

Prompt Template:

Read the picture and solve the following problem step by step.The last line of your response should be of the form "ANSWER: [ANSWER]" (without quotes) where [ANSWER] is the answer to the problem.

{question}

Remember to put your answer on its own line at the end in the form "ANSWER: [ANSWER]" (without quotes) where [ANSWER] is the answer to the problem, and you do not need to use a \boxed command.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets real_world_qa \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['real_world_qa'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)