AGENT Benchmarks#

Below is the list of supported AGENT benchmarks. Click on a benchmark name for details.

Benchmark Name

Pretty Name

Task Categories

bfcl_v3

BFCL-v3

Agent, FunctionCalling

bfcl_v4

BFCL-v4

Agent, FunctionCalling

general_fc

General-FunctionCalling

Agent, Custom, FunctionCalling

swe_bench_lite_agentic

SWE-bench_Lite_Agentic

Coding

swe_bench_verified_agentic

SWE-bench_Verified_Agentic

Coding

swe_bench_verified_mini_agentic

SWE-bench_Verified_Mini_Agentic

Coding

tau2_bench

τ²-bench

Agent, FunctionCalling, Reasoning

tau_bench

τ-bench

Agent, FunctionCalling, Reasoning