Video-MME-v2#

Overview#

Video-MME-v2 is a public comprehensive video understanding benchmark. It contains 800 videos, 3,200 multiple-choice QA instances, and word-level subtitles with timestamps. The native adapter uses the shared DatasetHub abstraction for both annotation loading and optional media archive downloads, so it exercises the same reusable video benchmark path as MVBench.

Task Description#

Task Type: Video multiple-choice question answering
Input: Video URL or archived MP4 + question + answer choices
Output: Single correct answer letter
Subsets: all, level_1, level_2, level_3, logic, relevance

Evaluation Notes#

Default configuration uses 0-shot evaluation
Primary metric: Accuracy
The default video source is the public url field for lightweight smoke tests
Set extra_params.video_source to archive to download and use the official MP4 archives
Set extra_params.use_subtitles to true to include word-level subtitles in the prompt

Properties#

Property	Value
Benchmark Name	`videomme_v2`
Dataset ID	MME-Benchmarks/Video-MME-v2
Paper	Paper
Tags	`MCQ`, `MultiModal`
Metrics	`acc`
Default Shots	0-shot
Evaluation Split	`test`

Data Statistics#

Metric	Value
Total Samples	3,200

Per-Subset Statistics:

Subset	Samples	Prompt Mean	Prompt Min	Prompt Max
`all`	3,200	N/A	N/A	N/A
`level_1`	686	N/A	N/A	N/A
`level_2`	834	N/A	N/A	N/A
`level_3`	837	N/A	N/A	N/A
`logic`	1,124	N/A	N/A	N/A
`relevance`	2,076	N/A	N/A	N/A

Sample Example#

Sample example not available.

Prompt Template#

Prompt Template:

Answer the following multiple choice question. The last line of your response should be of the following format: 'ANSWER: [LETTER]' (without quotes) where [LETTER] is one of {letters}. Think step by step before answering.

{question}

{choices}

Extra Parameters#

Parameter	Type	Default	Description
`dataset_id`	`str`	`MME-Benchmarks/Video-MME-v2`	Dataset repository ID or local dataset root for Video-MME-v2.
`dataset_hub`	`str`	`modelscope`	Dataset hub used to load annotations, subtitles, and optional video archives. Choices: [‘huggingface’, ‘modelscope’, ‘local’]
`dataset_revision`	`str`	``	Optional dataset revision; leave empty to use the hub default.
`video_source`	`str`	`url`	Use public URL fields for lightweight tests or official archived MP4 files. Choices: [‘url’, ‘archive’]
`use_subtitles`	`bool`	`False`	Include Video-MME-v2 subtitle text in the prompt.
`subtitle_word_limit`	`int`	`512`	Maximum number of subtitle words included per sample when subtitles are enabled.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets videomme_v2 \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['videomme_v2'],
    dataset_args={
        'videomme_v2': {
            # subset_list: ['all', 'level_1', 'level_2']  # optional, evaluate specific subsets
            # extra_params: {}  # uses default extra parameters
        }
    },
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)