EvalMuse#

Overview#

EvalMuse is a text-to-image benchmark that evaluates the quality and semantic alignment of generated images using fine-grained analysis with the FGA-BLIP2Score metric.

Task Description#

Task Type: Text-to-Image Generation Evaluation
Input: Text prompt for image generation
Output: Generated image evaluated for quality and semantic fidelity
Metric: FGA-BLIP2Score (Fine-Grained Analysis with BLIP-2)

Key Features#

Fine-grained semantic alignment evaluation
Uses BLIP-2 vision-language model for scoring
Evaluates both image quality and prompt adherence
Supports diverse prompt categories
Objective, reproducible metrics

Evaluation Notes#

Default configuration uses 0-shot evaluation
Only FGA_BLIP2Score metric is supported
Evaluates images from the test split
Can evaluate pre-generated images or generate new ones

Properties#

Property	Value
Benchmark Name	`evalmuse`
Dataset ID	AI-ModelScope/T2V-Eval-Prompts
Paper	N/A
Tags	`TextToImage`
Metrics	`FGA_BLIP2Score`
Default Shots	0-shot
Evaluation Split	`test`

Data Statistics#

Metric	Value
Total Samples	199
Prompt Length (Mean)	61.05 chars
Prompt Length (Min/Max)	8 / 347 chars

Sample Example#

Subset: EvalMuse

{
  "input": [
    {
      "id": "f8b508f4",
      "content": "wide angle photograph at night, award winning interior design apartment, night outside, dark, moody dim faint lighting, wood panel walls, cozy and calm, fabrics, textiles, pillows, lamps, colorful copper brass accents, secluded, many light sources, lamps, hardwood floors, book shelf, couch, desk, plants"
    }
  ],
  "id": 0,
  "group_id": 0,
  "metadata": {
    "prompt": "wide angle photograph at night, award winning interior design apartment, night outside, dark, moody dim faint lighting, wood panel walls, cozy and calm, fabrics, textiles, pillows, lamps, colorful copper brass accents, secluded, many light sources, lamps, hardwood floors, book shelf, couch, desk, plants",
    "category": "",
    "tags": [
      "photograph (activity)",
      "night (attribute)",
      "interior design apartment (location)",
      "outside (location)",
      "dark (attribute)",
      "moody dim faint lighting (attribute)",
      "wood panel walls (object)",
      "cozy and calm (attribute)",
      "fabrics (material)",
      "textiles (material)",
      "pillows (object)",
      "lamps (object)",
      "colorful (attribute)",
      "copper brass accents (material)",
      "secluded (attribute)",
      "many light sources (attribute)",
      "hardwood floors (material)",
      "book shelf (object)",
      "couch (object)",
      "desk (object)",
      "plants (object)",
      "wide angle (attribute)"
    ],
    "id": "EvalMuse_0",
    "image_path": ""
  }
}

Prompt Template#

No prompt template defined.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets evalmuse \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['evalmuse'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)