Video-MME-v2#
Overview#
Video-MME-v2 is a public comprehensive video understanding benchmark. It contains 800 videos,
3,200 multiple-choice QA instances, and word-level subtitles with timestamps. The native adapter
uses the shared DatasetHub abstraction for both annotation loading and optional media archive
downloads, so it exercises the same reusable video benchmark path as MVBench.
Task Description#
Task Type: Video multiple-choice question answering
Input: Video URL or archived MP4 + question + answer choices
Output: Single correct answer letter
Subsets:
all,level_1,level_2,level_3,logic,relevance
Evaluation Notes#
Default configuration uses 0-shot evaluation
Primary metric: Accuracy
The default video source is the public
urlfield for lightweight smoke testsSet
extra_params.video_sourcetoarchiveto download and use the official MP4 archivesSet
extra_params.use_subtitlestotrueto include word-level subtitles in the prompt
Properties#
Property |
Value |
|---|---|
Benchmark Name |
|
Dataset ID |
|
Paper |
|
Tags |
|
Metrics |
|
Default Shots |
0-shot |
Evaluation Split |
|
Data Statistics#
Metric |
Value |
|---|---|
Total Samples |
3,200 |
Per-Subset Statistics:
Subset |
Samples |
Prompt Mean |
Prompt Min |
Prompt Max |
|---|---|---|---|---|
|
3,200 |
N/A |
N/A |
N/A |
|
686 |
N/A |
N/A |
N/A |
|
834 |
N/A |
N/A |
N/A |
|
837 |
N/A |
N/A |
N/A |
|
1,124 |
N/A |
N/A |
N/A |
|
2,076 |
N/A |
N/A |
N/A |
Sample Example#
Sample example not available.
Prompt Template#
Prompt Template:
Answer the following multiple choice question. The last line of your response should be of the following format: 'ANSWER: [LETTER]' (without quotes) where [LETTER] is one of {letters}. Think step by step before answering.
{question}
{choices}
Extra Parameters#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Dataset repository ID or local dataset root for Video-MME-v2. |
|
|
|
Dataset hub used to load annotations, subtitles, and optional video archives. Choices: [‘huggingface’, ‘modelscope’, ‘local’] |
|
|
`` |
Optional dataset revision; leave empty to use the hub default. |
|
|
|
Use public URL fields for lightweight tests or official archived MP4 files. Choices: [‘url’, ‘archive’] |
|
|
|
Include Video-MME-v2 subtitle text in the prompt. |
|
|
|
Maximum number of subtitle words included per sample when subtitles are enabled. |
Usage#
Using CLI#
evalscope eval \
--model YOUR_MODEL \
--api-url OPENAI_API_COMPAT_URL \
--api-key EMPTY_TOKEN \
--datasets videomme_v2 \
--limit 10 # Remove this line for formal evaluation
Using Python#
from evalscope import run_task
from evalscope.config import TaskConfig
task_cfg = TaskConfig(
model='YOUR_MODEL',
api_url='OPENAI_API_COMPAT_URL',
api_key='EMPTY_TOKEN',
datasets=['videomme_v2'],
dataset_args={
'videomme_v2': {
# subset_list: ['all', 'level_1', 'level_2'] # optional, evaluate specific subsets
# extra_params: {} # uses default extra parameters
}
},
limit=10, # Remove this line for formal evaluation
)
run_task(task_cfg=task_cfg)