VLM评测集#

以下是支持的VLM评测集列表,点击数据集名称可查看详细信息。

数据集名称

标准名称

任务类别

a_okvqa

A-OKVQA

Knowledge, MCQ, MultiModal

ai2d

AI2D

Knowledge, MultiModal, QA

air_bench_chat

AIR-Bench-Chat

Audio, InstructionFollowing, QA

air_bench_foundation

AIR-Bench-Foundation

Audio, Knowledge, MCQ

blink

BLINK

Knowledge, MCQ, MultiModal

cc_bench

CCBench

Knowledge, MCQ, MultiModal

chartqa

ChartQA

Knowledge, MultiModal, QA

cmmmu

CMMMU

Chinese, Knowledge, MultiModal, QA

cmmu

CMMU

Knowledge, MCQ, MultiModal, QA

docvqa

DocVQA

Knowledge, MultiModal, QA

fleurs

FLEURS

Audio, MultiLingual, SpeechRecognition

general_vmcq

General-VMCQ

Custom, MCQ, MultiModal

general_vqa

General-VQA

Custom, MultiModal, QA

gsm8k_v

GSM8K-V

Math, MultiModal, Reasoning

hallusion_bench

HallusionBench

Hallucination, MultiModal, Yes/No

infovqa

InfoVQA

Knowledge, MultiModal, QA

librispeech

LibriSpeech

Audio, SpeechRecognition

math_verse

MathVerse

MCQ, Math, MultiModal, Reasoning

math_vision

MathVision

MCQ, Math, MultiModal, Reasoning

math_vista

MathVista

MCQ, Math, MultiModal, Reasoning

mia_bench

MIA-Bench

InstructionFollowing, MultiModal, QA

micro_vqa

MicroVQA

Knowledge, MCQ, Medical, MultiModal

mm_bench

MMBench

Knowledge, MultiModal, QA

mm_star

MMStar

Knowledge, MCQ, MultiModal

mmmu

MMMU

Knowledge, MultiModal, QA

mmmu_pro

MMMU-PRO

Knowledge, MCQ, MultiModal

mvbench

MVBench

MCQ, MultiModal

ocr_bench

OCRBench

Knowledge, MultiModal, QA

ocr_bench_v2

OCRBench-v2

Knowledge, MultiModal, QA

olympiad_bench

OlympiadBench

Math, Reasoning

omni_bench

OmniBench

Knowledge, MCQ, MultiModal

omni_doc_bench

OmniDocBench

Knowledge, MultiModal, QA

pope

POPE

Hallucination, MultiModal, Yes/No

real_world_qa

RealWorldQA

Knowledge, MultiModal, QA

science_qa

ScienceQA

Knowledge, MCQ, MultiModal

seed_bench_2_plus

SEED-Bench-2-Plus

Knowledge, MCQ, MultiModal, Reasoning

simple_vqa

SimpleVQA

MultiModal, QA, Reasoning

tir_bench

TIR-Bench

MultiModal, QA, Reasoning

torgo

TORGO

Audio, SpeechRecognition

videomme_v2

Video-MME-v2

MCQ, MultiModal

visulogic

VisuLogic

MCQ, Math, MultiModal, Reasoning

vstar_bench

V*Bench

Grounding, MCQ, MultiModal

zerobench

ZeroBench

Knowledge, MultiModal, QA