LibriSpeech#

Overview#

LibriSpeech is a large-scale corpus of approximately 1,000 hours of read English speech derived from audiobooks. It is one of the most widely used benchmarks for evaluating automatic speech recognition (ASR) systems.

Task Description#

  • Task Type: Automatic Speech Recognition (ASR)

  • Input: Audio recordings of read English speech from audiobooks

  • Output: Transcribed text

  • Language: English

Key Features#

  • 1,000 hours of high-quality read speech

  • Derived from LibriVox audiobooks (public domain)

  • Clean and “other” test sets for varying difficulty

  • Widely used baseline for ASR research

  • Standardized evaluation protocol

Evaluation Notes#

  • Default configuration uses test_clean split

  • Primary metric: Word Error Rate (WER)

  • Text normalization applied during evaluation

  • Prompt: “Please recognize the speech and only output the recognized content”

  • Metadata includes audio ID and duration information

Properties#

Property

Value

Benchmark Name

librispeech

Dataset ID

lmms-lab/Librispeech-concat

Paper

N/A

Tags

Audio, SpeechRecognition

Metrics

wer

Default Shots

0-shot

Evaluation Split

test_clean

Data Statistics#

Metric

Value

Total Samples

87

Prompt Length (Mean)

67 chars

Prompt Length (Min/Max)

67 / 67 chars

Audio Statistics:

Metric

Value

Total Audio Files

87

Audio per Sample

min: 1, max: 1, mean: 1

Formats

wav

Sample Example#

Subset: default

{
  "input": [
    {
      "id": "fd0309e6",
      "content": [
        {
          "text": "Please recognize the speech and only output the recognized content:"
        },
        {
          "audio": "[BASE64_AUDIO: wav, ~22.7MB]",
          "format": "wav"
        }
      ]
    }
  ],
  "target": "Eleven o'clock had struck it was a fine clear night they were the only persons on the road and they sauntered leisurely along to avoid paying the price of fatigue for the recreation provided for the toledans in their valley or on the banks of ... [TRUNCATED] ...  less surprised than they and the better to assure himself of so wonderful a fact he begged leocadia to give him some token which should make perfectly clear to him that which indeed he did not doubt since it was authenticated by his parents.",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "audio_id": "5639-40744",
    "audio_duration": 496.6899719238281
  }
}

Note: Some content was truncated for display.

Prompt Template#

Prompt Template:

Please recognize the speech and only output the recognized content:

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets librispeech \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['librispeech'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)