RefCOCO#

Overview#

RefCOCO is a dataset for training and evaluating models on Referring Expression Comprehension (REC). It contains images, object bounding boxes, and free-form natural-language expressions that uniquely describe target objects within MSCOCO images.

Task Description#

  • Task Type: Referring Expression Comprehension / Image Captioning

  • Input: Image (with visualization) + referring expression

  • Output: Bounding box coordinates or caption

  • Domains: Visual grounding, object localization, image understanding

Key Features#

  • Created via Amazon Mechanical Turk annotations

  • Three evaluation modes:

    • bbox: Image captioning with bounding box visualization

    • seg: Image captioning with segmentation visualization

    • bbox_rec: Grounding task - output normalized bounding box coordinates

  • Expressions uniquely identify target objects in complex scenes

  • Multiple subsets: test, val, testA, testB

Evaluation Notes#

  • Evaluation mode configurable via eval_mode parameter

  • Multiple metrics for comprehensive evaluation:

    • Grounding: IoU, ACC@0.1/0.3/0.5/0.7/0.9, Center_ACC

    • Captioning: BLEU (1-4), METEOR, ROUGE_L, CIDEr

  • Bounding boxes output as normalized coordinates [x1/W, y1/H, x2/W, y2/H]

  • Requires pycocoevalcap for caption metrics

Properties#

Property

Value

Benchmark Name

refcoco

Dataset ID

lmms-lab/RefCOCO

Paper

N/A

Tags

Grounding, ImageCaptioning, Knowledge, MultiModal

Metrics

IoU, ACC@0.1, ACC@0.3, ACC@0.5, ACC@0.7, ACC@0.9, Center_ACC, Bleu_1, Bleu_2, Bleu_3, Bleu_4, METEOR, ROUGE_L, CIDEr

Default Shots

0-shot

Evaluation Split

N/A

Data Statistics#

Metric

Value

Total Samples

17,596

Prompt Length (Mean)

146 chars

Prompt Length (Min/Max)

146 / 146 chars

Per-Subset Statistics:

Subset

Samples

Prompt Mean

Prompt Min

Prompt Max

test

5,000

146

146

146

val

8,811

146

146

146

testA

1,975

146

146

146

testB

1,810

146

146

146

Image Statistics:

Metric

Value

Total Images

13,785

Images per Sample

min: 1, max: 1, mean: 1

Resolution Range

300x176 - 640x640

Formats

jpeg

Sample Example#

Subset: test

{
  "input": [
    {
      "id": "53a494fc",
      "content": [
        {
          "text": "Please carefully observe the area circled in the image and come up with a caption for the area.\nAnswer the question using a single word or phrase."
        },
        {
          "image": "[BASE64_IMAGE: jpeg, ~57.6KB]"
        }
      ]
    }
  ],
  "target": "['guy petting elephant', 'foremost person', 'green shirt']",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "question_id": "469306",
    "iscrowd": 0,
    "file_name": "COCO_train2014_000000296747_0.jpg",
    "answer": [
      "guy petting elephant",
      "foremost person",
      "green shirt"
    ],
    "original_bbox": [
      59.04999923706055,
      93.23999786376953,
      375.0199890136719,
      362.5799865722656
    ],
    "bbox": [],
    "eval_mode": "bbox"
  }
}

Prompt Template#

No prompt template defined.

Extra Parameters#

Parameter

Type

Default

Description

eval_mode

str

bbox

Control the evaluation mode used by RefCOCO. bbox: image caption task, visualize the original image with bounding box; seg: image caption task, visualize the original image with segmentation; bbox_rec: grounding task, recognize bounding box coordinates. Choices: [‘bbox’, ‘seg’, ‘bbox_rec’]

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets refcoco \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['refcoco'],
    dataset_args={
        'refcoco': {
            # subset_list: ['test', 'val', 'testA']  # optional, evaluate specific subsets
            # extra_params: {}  # uses default extra parameters
        }
    },
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)