TriviaQA#

Overview#

TriviaQA is a large-scale reading comprehension dataset containing over 650K question-answer-evidence triples. Questions are collected from trivia enthusiast websites and paired with Wikipedia articles as evidence documents.

Task Description#

  • Task Type: Reading Comprehension / Question Answering

  • Input: Question with Wikipedia context passage

  • Output: Answer extracted or generated from context

  • Domain: General knowledge trivia questions

Key Features#

  • 650K+ question-answer-evidence triples

  • Questions written by trivia enthusiasts (naturally challenging)

  • Multiple valid answer aliases for flexible evaluation

  • Wikipedia articles provide evidence passages

  • Tests both reading comprehension and knowledge retrieval

Evaluation Notes#

  • Default configuration uses 0-shot evaluation

  • Uses the Wikipedia reading comprehension subset (rc.wikipedia)

  • Answers should follow the format: “ANSWER: [ANSWER]”

  • Supports inclusion-based matching for answer comparison

  • Evaluates on validation split

Properties#

Property

Value

Benchmark Name

trivia_qa

Dataset ID

evalscope/trivia_qa

Paper

N/A

Tags

QA, ReadingComprehension

Metrics

acc

Default Shots

0-shot

Evaluation Split

validation

Data Statistics#

Metric

Value

Total Samples

7,993

Prompt Length (Mean)

54126.56 chars

Prompt Length (Min/Max)

339 / 691325 chars

Sample Example#

Subset: rc.wikipedia

{
  "input": [
    {
      "id": "545e4eda",
      "content": "Read the content and answer the following question.\n\nContent: ['Andrew Lloyd Webber, Baron Lloyd-Webber   (born 22 March 1948) is an English composer and impresario of musical theatre. \\n\\nSeveral of his musicals have run for more than a deca ... [TRUNCATED] ... ening titles.']\n\nQuestion: Which Lloyd Webber musical premiered in the US on 10th December 1993?\n\nKeep your The last line of your response should be of the form \"ANSWER: [ANSWER]\" (without quotes) where [ANSWER] is the answer to the problem.\n"
    }
  ],
  "target": [
    "Sunset Blvd",
    "West Sunset Boulevard",
    "Sunset Boulevard",
    "Sunset Bulevard",
    "Sunset Blvd.",
    "sunset boulevard",
    "sunset bulevard",
    "west sunset boulevard",
    "sunset blvd"
  ],
  "id": 0,
  "group_id": 0,
  "metadata": {
    "question_id": "tc_33",
    "content": [
      "Andrew Lloyd Webber, Baron Lloyd-Webber   (born 22 March 1948) is an English composer and impresario of musical theatre. \n\nSeveral of his musicals have run for more than a decade both in the West End and on Broadway. He has composed 13 musica ... [TRUNCATED] ... same name, composed the song \"Fields of Sun\". The actual song was never used on the show, nor was it available on the CD soundtrack that was released at the time. He was however still credited for the unused song in the show's opening titles."
    ]
  }
}

Note: Some content was truncated for display.

Prompt Template#

Prompt Template:

Read the content and answer the following question.

Content: {content}

Question: {question}

Keep your The last line of your response should be of the form "ANSWER: [ANSWER]" (without quotes) where [ANSWER] is the answer to the problem.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets trivia_qa \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['trivia_qa'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)