FLEURS#

Overview#

FLEURS (Few-shot Learning Evaluation of Universal Representations of Speech) is a massively multilingual benchmark covering 102 languages for evaluating automatic speech recognition (ASR), spoken language understanding, and speech translation.

Task Description#

  • Task Type: Automatic Speech Recognition (ASR)

  • Input: Audio recordings with speech in various languages

  • Output: Transcribed text in the corresponding language

  • Languages: 102 languages including Mandarin Chinese, Cantonese, English, and many more

Key Features#

  • Massive multilingual coverage (102 languages)

  • Derived from FLoRes-101 machine translation benchmark

  • Includes diverse language families and scripts

  • High-quality human recordings and transcriptions

  • Metadata includes gender, language group, and speaker information

Evaluation Notes#

  • Default configuration uses test split

  • Primary metric: Word Error Rate (WER)

  • Default subsets: cmn_hans_cn (Mandarin), en_us (English), yue_hant_hk (Cantonese)

  • Language-specific text normalization applied during evaluation

  • Prompt: “Please recognize the speech and only output the recognized content”

Properties#

Property

Value

Benchmark Name

fleurs

Dataset ID

lmms-lab/fleurs

Paper

N/A

Tags

Audio, MultiLingual, SpeechRecognition

Metrics

wer

Default Shots

0-shot

Evaluation Split

test

Data Statistics#

Metric

Value

Total Samples

2,411

Prompt Length (Mean)

67 chars

Prompt Length (Min/Max)

67 / 67 chars

Per-Subset Statistics:

Subset

Samples

Prompt Mean

Prompt Min

Prompt Max

cmn_hans_cn

945

67

67

67

en_us

647

67

67

67

yue_hant_hk

819

67

67

67

Audio Statistics:

Metric

Value

Total Audio Files

2,411

Audio per Sample

min: 1, max: 1, mean: 1

Formats

wav

Sample Example#

Subset: cmn_hans_cn

{
  "input": [
    {
      "id": "daf508c3",
      "content": [
        {
          "text": "Please recognize the speech and only output the recognized content:"
        },
        {
          "audio": "[BASE64_AUDIO: wav, ~648.8KB]",
          "format": "wav"
        }
      ]
    }
  ],
  "target": "这 并 不 是 告 别 这 是 一 个 篇 章 的 结 束 也 是 新 篇 章 的 开 始",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "id": 1906,
    "num_samples": 166080,
    "raw_transcription": "“这并不是告别。这是一个篇章的结束,也是新篇章的开始。”",
    "language": "Mandarin Chinese",
    "gender": 0,
    "lang_id": "cmn_hans",
    "lang_group_id": 6
  }
}

Prompt Template#

Prompt Template:

Please recognize the speech and only output the recognized content:

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets fleurs \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['fleurs'],
    dataset_args={
        'fleurs': {
            # subset_list: ['cmn_hans_cn', 'en_us', 'yue_hant_hk']  # optional, evaluate specific subsets
        }
    },
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)