SWE-bench_Lite#
Overview#
SWE-bench Lite is a focused subset of SWE-bench containing 300 Issue-Pull Request pairs from 11 popular Python repositories. It provides a more accessible entry point for evaluating automated software engineering capabilities.
Task Description#
Task Type: Automated Software Engineering / Bug Fixing
Input: GitHub issue description with repository context
Output: Code patch (diff format) that resolves the issue
Size: 300 carefully selected test instances
Key Features#
300 test Issue-Pull Request pairs
11 popular Python repositories covered
Real-world bugs with verified solutions
Evaluation via unit test verification
More manageable than full SWE-bench while still challenging
Evaluation Notes#
Requires
pip install swebench==4.1.0before evaluationDocker images are built/pulled automatically for each repository
See the usage documentation for detailed setup instructions
Popular benchmark variant for initial model comparison
Properties#
Property |
Value |
|---|---|
Benchmark Name |
|
Dataset ID |
|
Paper |
N/A |
Tags |
|
Metrics |
|
Default Shots |
0-shot |
Evaluation Split |
|
Data Statistics#
Statistics not available.
Sample Example#
Sample example not available.
Prompt Template#
Prompt Template:
{question}
Extra Parameters#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Build Docker images locally for each sample. |
|
|
|
Attempt to pull existing remote Docker images before building. |
|
|
|
Oracle dataset ID used to fetch inference context. |
|
|
`` |
Optionally force the docker images to be pulled/built for a specific architecture. Choices: [‘’, ‘arm64’, ‘x86_64’] |
Usage#
Using CLI#
evalscope eval \
--model YOUR_MODEL \
--api-url OPENAI_API_COMPAT_URL \
--api-key EMPTY_TOKEN \
--datasets swe_bench_lite \
--limit 10 # Remove this line for formal evaluation
Using Python#
from evalscope import run_task
from evalscope.config import TaskConfig
task_cfg = TaskConfig(
model='YOUR_MODEL',
api_url='OPENAI_API_COMPAT_URL',
api_key='EMPTY_TOKEN',
datasets=['swe_bench_lite'],
dataset_args={
'swe_bench_lite': {
# extra_params: {} # uses default extra parameters
}
},
limit=10, # Remove this line for formal evaluation
)
run_task(task_cfg=task_cfg)