V*Bench#

Overview#

V*Bench is a benchmark designed for evaluating visual search capabilities within multimodal reasoning systems. It focuses on actively locating and identifying specific visual information in high-resolution images, crucial for fine-grained visual understanding.

Task Description#

  • Task Type: Visual Search and Reasoning (Multiple-Choice)

  • Input: High-resolution image + targeted visual query

  • Output: Answer letter (A/B/C/D)

  • Domains: Visual search, fine-grained recognition, visual grounding

Key Features#

  • Tests targeted visual query capabilities

  • Focuses on high-resolution image understanding

  • Requires finding and reasoning about specific visual elements

  • Questions guided by natural language instructions

  • Evaluates fine-grained visual understanding in complex scenes

Evaluation Notes#

  • Default evaluation uses the test split

  • Primary metric: Accuracy on multiple-choice questions

  • Uses Chain-of-Thought (CoT) prompting with “ANSWER: [LETTER]” format

  • Metadata includes category and question ID for analysis

Properties#

Property

Value

Benchmark Name

vstar_bench

Dataset ID

lmms-lab/vstar-bench

Paper

N/A

Tags

Grounding, MCQ, MultiModal

Metrics

acc

Default Shots

0-shot

Evaluation Split

test

Data Statistics#

Metric

Value

Total Samples

191

Prompt Length (Mean)

366.07 chars

Prompt Length (Min/Max)

332 / 402 chars

Image Statistics:

Metric

Value

Total Images

191

Images per Sample

min: 1, max: 1, mean: 1

Resolution Range

1690x1500 - 5759x1440

Formats

jpeg

Sample Example#

Subset: default

{
  "input": [
    {
      "id": "d0b9c304",
      "content": [
        {
          "text": "Answer the following multiple choice question. The last line of your response should be of the following format: 'ANSWER: [LETTER]' (without quotes) where [LETTER] is one of A, B, C, D. Think step by step before answering.\n\nWhat is the material of the glove?\n(A) rubber\n(B) cotton\n(C) kevlar\n(D) leather\nAnswer with the option's letter from the given choices directly."
        },
        {
          "image": "[BASE64_IMAGE: jpeg, ~1.2MB]"
        }
      ]
    }
  ],
  "choices": [
    "A",
    "B",
    "C",
    "D"
  ],
  "target": "A",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "category": "direct_attributes",
    "question_id": "0"
  }
}

Prompt Template#

Prompt Template:

Answer the following multiple choice question. The last line of your response should be of the following format: 'ANSWER: [LETTER]' (without quotes) where [LETTER] is one of A, B, C, D. Think step by step before answering.

{question}

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets vstar_bench \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['vstar_bench'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)