LibriSpeech#
Overview#
LibriSpeech is a large-scale corpus of approximately 1,000 hours of read English speech derived from audiobooks. It is one of the most widely used benchmarks for evaluating automatic speech recognition (ASR) systems.
Task Description#
Task Type: Automatic Speech Recognition (ASR)
Input: Audio recordings of read English speech from audiobooks
Output: Transcribed text
Language: English
Key Features#
1,000 hours of high-quality read speech
Derived from LibriVox audiobooks (public domain)
Clean and “other” test sets for varying difficulty
Widely used baseline for ASR research
Standardized evaluation protocol
Evaluation Notes#
Default configuration uses test_clean split
Primary metric: Word Error Rate (WER)
Text normalization applied during evaluation
Prompt: “Please recognize the speech and only output the recognized content”
Metadata includes audio ID and duration information
Properties#
Property |
Value |
|---|---|
Benchmark Name |
|
Dataset ID |
|
Paper |
N/A |
Tags |
|
Metrics |
|
Default Shots |
0-shot |
Evaluation Split |
|
Data Statistics#
Metric |
Value |
|---|---|
Total Samples |
87 |
Prompt Length (Mean) |
67 chars |
Prompt Length (Min/Max) |
67 / 67 chars |
Audio Statistics:
Metric |
Value |
|---|---|
Total Audio Files |
87 |
Audio per Sample |
min: 1, max: 1, mean: 1 |
Formats |
wav |
Sample Example#
Subset: default
{
"input": [
{
"id": "fd0309e6",
"content": [
{
"text": "Please recognize the speech and only output the recognized content:"
},
{
"audio": "[BASE64_AUDIO: wav, ~22.7MB]",
"format": "wav"
}
]
}
],
"target": "Eleven o'clock had struck it was a fine clear night they were the only persons on the road and they sauntered leisurely along to avoid paying the price of fatigue for the recreation provided for the toledans in their valley or on the banks of ... [TRUNCATED] ... less surprised than they and the better to assure himself of so wonderful a fact he begged leocadia to give him some token which should make perfectly clear to him that which indeed he did not doubt since it was authenticated by his parents.",
"id": 0,
"group_id": 0,
"metadata": {
"audio_id": "5639-40744",
"audio_duration": 496.6899719238281
}
}
Note: Some content was truncated for display.
Prompt Template#
Prompt Template:
Please recognize the speech and only output the recognized content:
Usage#
Using CLI#
evalscope eval \
--model YOUR_MODEL \
--api-url OPENAI_API_COMPAT_URL \
--api-key EMPTY_TOKEN \
--datasets librispeech \
--limit 10 # Remove this line for formal evaluation
Using Python#
from evalscope import run_task
from evalscope.config import TaskConfig
task_cfg = TaskConfig(
model='YOUR_MODEL',
api_url='OPENAI_API_COMPAT_URL',
api_key='EMPTY_TOKEN',
datasets=['librispeech'],
limit=10, # Remove this line for formal evaluation
)
run_task(task_cfg=task_cfg)