CommonsenseQA#

Overview#

CommonsenseQA is a benchmark for evaluating AI models’ ability to answer questions that require commonsense reasoning about the world. Questions are designed to require background knowledge not explicitly stated in the question.

Task Description#

  • Task Type: Multiple-Choice Commonsense Reasoning

  • Input: Question requiring commonsense knowledge with 5 choices

  • Output: Correct answer letter (A-E)

  • Focus: World knowledge and commonsense inference

Key Features#

  • Questions generated from ConceptNet knowledge graph

  • Requires different types of commonsense knowledge

  • 5 answer choices per question

  • Tests reasoning about everyday concepts and relationships

  • Human-validated questions

Evaluation Notes#

  • Default configuration uses 0-shot evaluation

  • Uses simple multiple-choice prompting

  • Evaluates on validation split

  • Simple accuracy metric

Properties#

Property

Value

Benchmark Name

commonsense_qa

Dataset ID

extraordinarylab/commonsense-qa

Paper

N/A

Tags

Commonsense, MCQ, Reasoning

Metrics

acc

Default Shots

0-shot

Evaluation Split

validation

Data Statistics#

Metric

Value

Total Samples

1,221

Prompt Length (Mean)

326.11 chars

Prompt Length (Min/Max)

257 / 537 chars

Sample Example#

Subset: default

{
  "input": [
    {
      "id": "c9b96cfc",
      "content": "Answer the following multiple choice question. The entire content of your response should be of the following format: 'ANSWER: [LETTER]' (without quotes) where [LETTER] is one of A,B,C,D,E.\n\nA revolving door is convenient for two direction travel, but it also serves as a security measure at a what?\n\nA) bank\nB) library\nC) department store\nD) mall\nE) new york"
    }
  ],
  "choices": [
    "bank",
    "library",
    "department store",
    "mall",
    "new york"
  ],
  "target": "A",
  "id": 0,
  "group_id": 0,
  "metadata": {}
}

Prompt Template#

Prompt Template:

Answer the following multiple choice question. The entire content of your response should be of the following format: 'ANSWER: [LETTER]' (without quotes) where [LETTER] is one of {letters}.

{question}

{choices}

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets commonsense_qa \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['commonsense_qa'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)