HMMT26#

Overview#

HMMT February 2026 is a challenging evaluation benchmark derived from the Harvard-MIT Mathematics Tournament (HMMT) February 2026 competition, one of the most prestigious and difficult high school math contests globally.

Task Description#

  • Task Type: Competition Mathematics Problem Solving

  • Input: HMMT-level mathematical problem

  • Output: Answer with step-by-step reasoning

  • Difficulty: Advanced high school competition level

Key Features#

  • 33 problems from HMMT February 2026 competition

  • Four primary domains: Algebra, Combinatorics, Geometry, Number Theory

  • Highly challenging competition-level problems

  • Tests advanced mathematical reasoning

  • Represents elite high school mathematics difficulty

Evaluation Notes#

  • Default configuration uses 0-shot evaluation

  • Answers should be formatted within \boxed{} for proper extraction

  • Numeric accuracy metric with symbolic equivalence checking

  • Problems span multiple mathematical domains

Properties#

Property

Value

Benchmark Name

hmmt26

Dataset ID

evalscope/hmmt_feb_2026

Paper

N/A

Tags

Math, Reasoning

Metrics

acc

Default Shots

0-shot

Evaluation Split

train

Data Statistics#

Metric

Value

Total Samples

33

Prompt Length (Mean)

385.3 chars

Prompt Length (Min/Max)

202 / 605 chars

Sample Example#

Subset: default

{
  "input": [
    {
      "id": "02be53e7",
      "content": "Problem:\nA line intersects the graph of \\( y = x^2 + \\frac{2}{x} \\) at three distinct points. Given that the \\( x \\)-coordinates of two of the points are 6 and 7, respectively, compute the \\( x \\)-coordinate of the third point.\n\nPlease reason step by step, and put your final answer within \\boxed{}.\n"
    }
  ],
  "target": "-\\frac{1}{21}",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "problem_idx": 1,
    "problem_type": [
      "Algebra"
    ]
  }
}

Prompt Template#

Prompt Template:

Problem:
{question}

Please reason step by step, and put your final answer within \boxed{{}}.

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets hmmt26 \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['hmmt26'],
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)