AIGC Benchmarks#
Below is the list of supported AIGC benchmarks. Click on a benchmark name to jump to details.
Benchmark Name |
Pretty Name |
Task Categories |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Benchmark Details#
EvalMuse#
Dataset Name:
evalmuseDataset ID: AI-ModelScope/T2V-Eval-Prompts
Description:
EvalMuse Text-to-Image Benchmark. Used for evaluating the quality and semantic alignment of finely generated images
Task Categories:
TextToImageEvaluation Metrics:
FGA_BLIP2ScoreAggregation Methods:
meanRequires LLM Judge: No
Default Shots: 0-shot
Subsets:
EvalMuse
GEdit-Bench#
Dataset Name:
geditDataset ID: stepfun-ai/GEdit-Bench
Description:
GEdit-Bench Image Editing Benchmark, grounded in real-world usages is developed to support more authentic and comprehensive evaluation of image editing models.
Task Categories:
ImageEditingEvaluation Metrics:
Perceptual Similarity,Semantic ConsistencyAggregation Methods:
meanRequires LLM Judge: Yes
Default Shots: 0-shot
Subsets:
background_change,color_alter,material_alter,motion_change,ps_human,style_change,subject-add,subject-remove,subject-replace,text_change,tone_transferExtra Parameters:
{
"language": {
"type": "str",
"description": "Language of the instruction. Choices: ['en', 'cn'].",
"value": "en",
"choices": [
"en",
"cn"
]
}
}
GenAI-Bench#
Dataset Name:
genai_benchDataset ID: AI-ModelScope/T2V-Eval-Prompts
Description:
GenAI-Bench Text-to-Image Benchmark. Includes 1600 prompts for text-to-image task.
Task Categories:
TextToImageEvaluation Metrics:
VQAScoreAggregation Methods:
meanRequires LLM Judge: No
Default Shots: 0-shot
Subsets:
GenAI-Bench-1600
general_t2i#
Dataset Name:
general_t2iDataset ID: general_t2i
Description:
General Text-to-Image Benchmark
Task Categories:
Custom,TextToImageEvaluation Metrics:
PickScoreAggregation Methods:
meanRequires LLM Judge: No
Default Shots: 0-shot
Subsets:
default
HPD-v2#
Dataset Name:
hpdv2Dataset ID: AI-ModelScope/T2V-Eval-Prompts
Description:
HPDv2 Text-to-Image Benchmark. Evaluation metrics based on human preferences, trained on the Human Preference Dataset (HPD v2)
Task Categories:
TextToImageEvaluation Metrics:
HPSv2.1ScoreAggregation Methods:
meanRequires LLM Judge: No
Default Shots: 0-shot
Subsets:
HPDv2
TIFA-160#
Dataset Name:
tifa160Dataset ID: AI-ModelScope/T2V-Eval-Prompts
Description:
TIFA-160 Text-to-Image Benchmark
Task Categories:
TextToImageEvaluation Metrics:
PickScoreAggregation Methods:
meanRequires LLM Judge: No
Default Shots: 0-shot
Subsets:
TIFA-160