Supported Datasets#
1. Native Supported Datasets#
Tip
The framework currently supports the following datasets. If the dataset you need is not in the list, please submit an issue, or use the OpenCompass backend for evaluation, or use the VLMEvalKit backend for multi-modal model evaluation.
Dataset Name |
Link |
Status |
Note |
|---|---|---|---|
|
Active |
||
|
Active |
||
|
Active |
||
|
Active |
||
|
Active |
||
|
Active |
||
|
Active |
||
|
Active |
||
|
Active |
||
|
Active |
||
|
To be integrated |
2. Datasets Supported by OpenCompass#
Refer to the detailed explanation
| Language | Knowledge | Reasoning | Examination |
Word Definition
Idiom Learning
Semantic Similarity
Coreference Resolution
Translation
Multi-language Question Answering
Multi-language Summary
|
Knowledge Question Answering
|
Textual Entailment
Commonsense Reasoning
Mathematical Reasoning
Theorem Application
Comprehensive Reasoning
|
Junior High, High School, University, Professional Examinations
Medical Examinations
|
| Understanding | Long Context | Safety | Code |
Reading Comprehension
Content Summary
Content Analysis
|
Long Context Understanding
|
Safety
Robustness
|
Code
|
3. Datasets Supported by VLMEvalKit#
Refer to the detailed explanation
Image Understanding Dataset#
Abbreviations used:
MCQ: Multiple Choice Questions;Y/N: Yes/No Questions;MTT: Multiturn Dialogue Evaluation;MTI: Multi-image Input Evaluation
Dataset |
Dataset Names |
Task |
|---|---|---|
MMBench Series: |
MMBench_DEV_[EN/CN] |
MCQ |
MMStar |
MCQ |
|
MME |
Y/N |
|
SEEDBench_IMG |
MCQ |
|
MMVet |
VQA |
|
MMMU_[DEV_VAL/TEST] |
MCQ |
|
MathVista_MINI |
VQA |
|
ScienceQA_[VAL/TEST] |
MCQ |
|
COCO_VAL |
Caption |
|
HallusionBench |
Y/N |
|
OCRVQA_[TESTCORE/TEST] |
VQA |
|
TextVQA_VAL |
VQA |
|
ChartQA_TEST |
VQA |
|
AI2D_[TEST/TEST_NO_MASK] |
MCQ |
|
LLaVABench |
VQA |
|
DocVQA_[VAL/TEST] |
VQA |
|
InfoVQA_[VAL/TEST] |
VQA |
|
OCRBench |
VQA |
|
RealWorldQA |
MCQ |
|
POPE |
Y/N |
|
CORE_MM (MTI) |
VQA |
|
MMT-Bench_[VAL/ALL] |
MCQ (MTI) |
|
MLLMGuard_DS |
VQA |
|
AesBench_[VAL/TEST] |
MCQ |
|
VCR-wiki + |
VCR_[EN/ZH]_[EASY/HARD]_[ALL/500/100] |
VQA |
MMLongBench_DOC |
VQA (MTI) |
|
BLINK |
MCQ (MTI) |
|
MathVision |
VQA |
|
MTVQA_TEST |
VQA |
|
MMDU+ |
MMDU |
VQA (MTT, MTI) |
Q-Bench1_[VAL/TEST] |
MCQ |
|
A-Bench_[VAL/TEST] |
MCQ |
|
DUDE+ |
DUDE |
VQA (MTI) |
SLIDEVQA |
VQA (MTI) |
|
TaskMeAnything_v1_imageqa_random |
MCQ |
|
MMMB_[ar/cn/en/pt/ru/tr] |
MCQ |
|
A-OKVQA |
MCQ |
|
MUIRBench |
MCQ |
|
GMAI-MMBench_VAL |
MCQ |
|
TableVQABench |
VQA |
Note
* Partial model testing results are provided here, while remaining models cannot achieve reasonable accuracy under zero-shot conditions.
+ Testing results for this evaluation set have not yet been provided.
- VLMEvalKit only supports inference for this evaluation set and cannot output final accuracy.
Video Understanding Dataset#
Dataset |
Dataset Name |
Task |
|---|---|---|
MMBench-Video |
VQA |
|
MVBench/MVBench_MP4 |
MCQ |
|
Video-MME |
MCQ |