Parameter#
Execute evalscope perf --help to get a full parameter description.
Basic Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Name or path of the test model |
- |
|
|
API address, supporting |
- |
|
|
Name for wandb/swanlab database result and result database |
|
|
|
Service API type |
- |
|
|
Port for local inference service |
|
|
|
Attention implementation method |
|
|
|
API key |
|
|
|
Whether to output debug information |
|
Network Configuration#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Total timeout for each request (seconds) |
|
|
|
Network connection timeout (seconds) |
|
|
|
Network read timeout (seconds) |
|
|
|
Additional HTTP headers |
- |
|
|
Do not send connection test, start stress test directly |
|
Request Control#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Number of concurrent requests |
|
|
|
Total number of requests to be sent |
|
|
|
Request generation rate (requests/second) |
|
|
|
Log every N queries |
|
|
|
Whether to use SSE stream output |
|
|
|
Sleep time between each performance test (seconds) |
|
Tip
--rate and --parallel control two independent phases:
Generation phase (controlled by
--rate): Requests are generated and placed into a queue at the specified rate.rate=-1: No rate limit; all requests are enqueued immediately.rate=R: Inter-arrival intervals follow an exponential distribution with mean1/Rseconds (Poisson arrival model), resulting in an average ofRrequests enqueued per second.
Sending phase (controlled by
--parallel): At mostparallelrequests are in-flight simultaneously (sent but not yet responded to); each worker fetches the next request from the queue only after receiving a response to the previous one.
The two parameters are independent: --rate determines how quickly requests enter the queue, while --parallel determines how many requests are actively being sent at any given time.
SLA Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Whether to enable SLA auto-tuning mode |
|
|
|
Variable for auto-tuning |
|
|
|
SLA constraint conditions |
|
|
|
Upper bound of the tuned SLA variable search range |
|
|
|
Lower bound of the tuned SLA variable search range |
|
|
|
Fixed parallel workers used when |
|
|
|
Number of runs per concurrency level (average taken) |
|
|
|
Multiplier of total requests relative to the tuned variable (concurrency or rate), i.e. |
|
See also
For details on using the SLA auto-tuning feature, see the Auto-tuning Guide.
Prompt Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Maximum input prompt length |
|
|
|
Minimum input prompt length |
|
|
|
Length of the prompt prefix |
|
|
|
Specify request prompt |
- |
|
|
Specify query template |
- |
|
|
Whether to apply chat template |
|
|
|
Image width for random VL dataset |
|
|
|
Image height for random VL dataset |
|
|
|
Image format for random VL dataset |
|
|
|
Number of images for random VL dataset |
|
|
|
Patch size for the image |
|
Dataset Configuration#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Dataset mode, see table below for details |
- |
|
|
Dataset file path |
- |
Dataset Mode Description#
Mode |
Description |
Supports dataset-path |
|---|---|---|
|
Automatically downloads OpenQA from ModelScope |
✓ |
|
Automatically downloads LongAlpaca-12k from ModelScope |
✓ |
|
Each line in txt file is used as a separate prompt |
✓ (Required) |
|
Automatically downloads Flick8k from ModelScope |
✗ |
|
Automatically downloads Kontext-Bench from ModelScope |
✗ |
|
Randomly generates prompts based on |
✗ |
|
Randomly generates both image and text inputs |
✗ |
|
Load text data from file to evaluate Embedding model |
✓ (Required) |
|
Randomly generate queries based on |
✗ |
|
Batch send text data to evaluate Embedding model |
✓ (Required) |
|
Batch send randomly generated query data based on |
✗ |
|
Load Query-Document pairs from file to evaluate Rerank model |
✓ (Required) |
|
Randomly generate query data based on |
✗ |
|
Custom dataset parser |
✓ |
Model Settings#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Tokenizer weights path |
|
|
|
frequency_penalty value |
- |
|
|
Whether to return logarithmic probabilities |
- |
|
|
Maximum number of tokens that can be generated |
- |
|
|
Minimum number of tokens to generate |
- |
|
|
Number of completion choices to generate |
- |
|
|
Random seed |
|
|
|
Tokens that stop the generation |
- |
|
|
IDs of tokens that stop the generation |
- |
|
|
Sampling temperature |
|
|
|
Top-p sampling |
- |
|
|
Top-k sampling |
- |
|
|
Additional parameters to be passed in the request body |
- |
Data Storage#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Visualizer to use |
|
|
|
Whether to enable progress tracking, writing hierarchical stress-test progress to |
|
|
|
wandb API key for logging metrics to wandb |
- |
|
|
swanlab API key for logging metrics to swanlab |
- |
|
|
Output file path |
|
Other Parameters#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Number of rows buffered before writing results to SQLite database |
|
|
|
Maximum size of the request queue |
|
|
|
Maximum number of in-flight tasks |
|