Parameter#
Execute evalscope perf --help to get a full parameter description.
Basic Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Name or path of the test model |
- |
|
|
API address, supporting |
- |
|
|
Name for wandb/swanlab database result and result database |
|
|
|
Service API type |
- |
|
|
Port for local inference service |
|
|
|
Attention implementation method |
|
|
|
API key |
|
|
|
Whether to output debug information |
|
Network Configuration#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Total timeout for each request (seconds) |
|
|
|
Network connection timeout (seconds) |
|
|
|
Network read timeout (seconds) |
|
|
|
Additional HTTP headers |
- |
|
|
Do not send connection test, start stress test directly |
|
Request Control#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Number of concurrent requests |
|
|
|
Total number of requests to be sent |
|
|
|
Request generation rate (requests/second) |
|
|
|
Log every N queries |
|
|
|
Whether to use SSE stream output |
|
|
|
Sleep time between each performance test (seconds) |
|
|
|
Enable open-loop mode: dispatch requests following a Poisson arrival schedule without semaphore backpressure. |
|
|
|
Number or ratio of warmup requests: |
|
Tip
Closed-loop (default) vs Open-loop (--open-loop) β parameter behaviour comparison:
Closed-loop (default) |
Open-loop ( |
|
|---|---|---|
|
Controls enqueue rate ( |
Controls dispatch rate; must be > 0; accepts multiple values (e.g. |
|
Total requests per run; must match |
Total requests per run; must match |
|
Max in-flight requests; each worker waits for a response before sending the next (backpressure) |
Ignored; concurrency is unbounded (INF); requests are fired on schedule without waiting for responses |
Use case |
Measure latency and throughput under controlled concurrency |
Simulate realistic traffic (arrivals independent of service time); sweep throughput-latency curve across multiple rates |
SLA Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Whether to enable SLA auto-tuning mode |
|
|
|
Variable for auto-tuning |
|
|
|
SLA constraint conditions |
|
|
|
Upper bound of the tuned SLA variable search range |
|
|
|
Lower bound of the tuned SLA variable search range |
|
|
|
Fixed parallel workers used when |
|
|
|
Number of runs per concurrency level (average taken) |
|
|
|
Multiplier of total requests relative to the tuned variable (concurrency or rate), i.e. |
|
See also
For details on using the SLA auto-tuning feature, see the Auto-tuning Guide.
Prompt Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Maximum input prompt length |
|
|
|
Minimum input prompt length |
|
|
|
Length of the prompt prefix |
|
|
|
Specify request prompt |
- |
|
|
Specify query template |
- |
|
|
Whether to apply chat template |
|
|
|
Image width for random VL dataset |
|
|
|
Image height for random VL dataset |
|
|
|
Image format for random VL dataset |
|
|
|
Number of images for random VL dataset |
|
|
|
Patch size for the image |
|
Dataset Configuration#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Dataset mode, see table below for details |
- |
|
|
Dataset file path |
- |
Dataset Mode Description#
Text / Chat
Mode |
Description |
Supports dataset-path |
|---|---|---|
|
Automatically downloads OpenQA from ModelScope |
β |
|
Automatically downloads LongAlpaca-12k from ModelScope |
β |
|
Each line in txt file is used as a separate prompt |
β (Required) |
|
Randomly generates prompts based on |
β |
|
Custom dataset parser |
β |
Multimodal
Mode |
Description |
Supports dataset-path |
|---|---|---|
|
Automatically downloads Flick8k from ModelScope |
β |
|
Automatically downloads Kontext-Bench from ModelScope |
β |
|
Randomly generates both image and text inputs |
β |
Embedding
Mode |
Description |
Supports dataset-path |
|---|---|---|
|
Load text data from file to evaluate Embedding model |
β (Required) |
|
Randomly generate queries based on |
β |
|
Batch send text data to evaluate Embedding model |
β (Required) |
|
Batch send randomly generated query data to evaluate Embedding model |
β |
Rerank
Mode |
Description |
Supports dataset-path |
|---|---|---|
|
Load Query-Document pairs from file to evaluate Rerank model |
β (Required) |
|
Randomly generate query data to evaluate Rerank model |
β |
Multi-turn Conversation
Must be used with --multi-turn. See the Multi-turn Benchmark Guide for details.
Mode |
Description |
Supports dataset-path |
|---|---|---|
|
Synthetic multi-turn conversations; each turn randomly generates a token sequence |
β |
|
Automatically downloads the Chinese ShareGPT dataset (~70k conversations) from ModelScope, preserving full multi-turn conversations |
β |
|
Automatically downloads the English ShareGPT dataset (~70k conversations) from ModelScope, preserving full multi-turn conversations |
β |
|
Uses a local JSONL file as a custom multi-turn dataset |
β (Required) |
Model Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Tokenizer weights path |
|
|
|
frequency_penalty value |
- |
|
|
Whether to return logarithmic probabilities |
- |
|
|
Maximum number of tokens that can be generated |
- |
|
|
Minimum number of tokens to generate |
- |
|
|
Number of completion choices to generate |
- |
|
|
Random seed |
|
|
|
Tokens that stop the generation |
- |
|
|
IDs of tokens that stop the generation |
- |
|
|
Sampling temperature |
|
|
|
Top-p sampling |
- |
|
|
Top-k sampling |
- |
|
|
Additional parameters to be passed in the request body |
- |
|
|
Tokenize the prompt client-side into a token-ID list and send it directly via |
|
Data Storage#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Visualizer to use |
|
|
|
Whether to enable progress tracking, writing hierarchical stress-test progress to |
|
|
|
wandb API key for logging metrics to wandb |
- |
|
|
swanlab API key for logging metrics to swanlab |
- |
|
|
Output file path |
|
|
|
Exclude timestamp from output directory name |
|
Multi-turn Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Enable multi-turn conversation benchmark mode; |
|
|
|
Minimum number of user turns per conversation; used by |
|
|
|
Maximum number of user turns per conversation; required for |
|
See also
For details on using the multi-turn benchmark feature, see the Multi-turn Benchmark Guide.
Other Parameters#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Number of rows buffered before writing results to SQLite database |
|
|
|
Maximum size of the request queue |
|
|
|
Maximum number of in-flight tasks |
|