MVBench#

Overview#

MVBench is a public multimodal video understanding benchmark covering temporal perception, attribute/state reasoning, symbolic ordering, and high-level cognition. This native adapter uses the ModelScope PKU-Alignment/MVBench mirror by default, which provides JSON annotations plus optimized video archives.

Task Description#

Task Type: Video multiple-choice question answering
Input: Video + question + answer choices
Output: Single correct answer letter
Subsets: 20 MVBench tasks; the default smoke-test subset is action_antonym

Evaluation Notes#

Default configuration uses 0-shot evaluation
Primary metric: Accuracy
The default action_antonym subset downloads a small public MP4 archive for quick validation
Full benchmark evaluation can be requested by setting subset_list to additional MVBench subsets
Time-bounded records keep start/end metadata and add a short segment instruction to the prompt

Properties#

Property	Value
Benchmark Name	`mvbench`
Dataset ID	PKU-Alignment/MVBench
Paper	Paper
Tags	`MCQ`, `MultiModal`
Metrics	`acc`
Default Shots	0-shot
Evaluation Split	`train`

Data Statistics#

Metric	Value
Total Samples	4,000

Per-Subset Statistics:

Subset	Samples	Prompt Mean	Prompt Min	Prompt Max
`action_antonym`	200	N/A	N/A	N/A
`action_count`	200	N/A	N/A	N/A
`action_localization`	200	N/A	N/A	N/A
`action_prediction`	200	N/A	N/A	N/A
`action_sequence`	200	N/A	N/A	N/A
`character_order`	200	N/A	N/A	N/A
`counterfactual_inference`	200	N/A	N/A	N/A
`egocentric_navigation`	200	N/A	N/A	N/A
`episodic_reasoning`	200	N/A	N/A	N/A
`fine_grained_action`	200	N/A	N/A	N/A
`fine_grained_pose`	200	N/A	N/A	N/A
`moving_attribute`	200	N/A	N/A	N/A
`moving_count`	200	N/A	N/A	N/A
`moving_direction`	200	N/A	N/A	N/A
`object_existence`	200	N/A	N/A	N/A
`object_interaction`	200	N/A	N/A	N/A
`object_shuffle`	200	N/A	N/A	N/A
`scene_transition`	200	N/A	N/A	N/A
`state_change`	200	N/A	N/A	N/A
`unexpected_action`	200	N/A	N/A	N/A

Sample Example#

Sample example not available.

Prompt Template#

Prompt Template:

Answer the following multiple choice question. The last line of your response should be of the following format: 'ANSWER: [LETTER]' (without quotes) where [LETTER] is one of {letters}. Think step by step before answering.

{question}

{choices}

Extra Parameters#

Parameter	Type	Default	Description
`dataset_id`	`str`	`PKU-Alignment/MVBench`	Dataset repository ID or local dataset root for MVBench annotations and videos.
`dataset_hub`	`str`	`modelscope`	Dataset hub used to load annotations and video archives. Choices: [‘huggingface’, ‘modelscope’, ‘local’]
`dataset_revision`	`str`	``	Optional dataset revision; leave empty to use the hub default.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets mvbench \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['mvbench'],
    dataset_args={
        'mvbench': {
            # subset_list: ['action_antonym', 'action_count', 'action_localization']  # optional, evaluate specific subsets
            # extra_params: {}  # uses default extra parameters
        }
    },
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)