Parameter Description#
Execute evalscope perf --help to get a full parameter description:
Basic Settings#
--model: Name of the test model.--url: Specify the API address.--name: Name for the wandb database result and result database, default is{model_name}_{current_time}, optional.--api: Specify the service API, currently supports [openai|dashscope|local|local_vllm].Select
openaito use the API supporting OpenAI, requiring the--urlparameter.Select
dashscopeto use the API supporting DashScope, requiring the--urlparameter.Select
localto use local files as models and perform inference using transformers.--modelshould be the model file path or model_id, which will be automatically downloaded from modelscope, e.g.,Qwen/Qwen2.5-0.5B-Instruct.Select
local_vllmto use local files as models and start the vllm inference service.--modelshould be the model file path or model_id, which will be automatically downloaded from modelscope, e.g.,Qwen/Qwen2.5-0.5B-Instruct.You can also use a custom API, refer to Custom API Guide.
--port: The port for the local inference service, defaulting to 8877. This is only applicable tolocalandlocal_vllm.--attn-implementation: Attention implementation method, default is None, optional [flash_attention_2|eager|sdpa], only effective whenapiislocal.--api-key: API key, optional.--debug: Output debug information.
Network Configuration#
--connect-timeout: Network connection timeout, default is 120 seconds.--read-timeout: Network read timeout, default is 120 seconds.--headers: Additional HTTP headers, formatted askey1=value1 key2=value2. This header will be used for each query.
Request Control#
--number: Number of requests sent, default is None, meaning requests are sent based on the dataset size.--parallelsets the number of workers for concurrent requests, with a default of 1.--ratespecifies the number of requests generated per second (not sent), with a default of -1, indicating that all requests will be generated at time 0 with no interval; otherwise, we use a Poisson process to generate request intervals.Tip
In the implementation of this tool, request generation and sending are separated: The
--rateparameter is used to control the number of requests generated per second, and these requests will be placed in a request queue. The--parallelparameter is used to control the number of workers sending requests; workers will take requests from the queue and send them, only sending the next request after receiving a response for the previous one. It is not recommended to set both parameters simultaneously; thus, in this tool, the--rateparameter is only effective when--parallelis set to 1.--log-every-n-query: Log every n queries, default is 10.--stream: Use SSE stream output, default is False.
Prompt Settings#
--max-prompt-length: Maximum input prompt length, default issys.maxsize. Prompts exceeding this length will be discarded.--min-prompt-length: Minimum input prompt length, default is 0. Prompts shorter than this will be discarded.--prompt: Specify request prompt, a string or local file, taking precedence overdataset. When using a local file, specify the file path with@/path/to/file, e.g.,@./prompt.txt.--query-template: Specify query template, aJSONstring or local file. When using a local file, specify the file path with@/path/to/file, e.g.,@./query_template.json.
Dataset Configuration#
--datasetspecifies the dataset [openqa|longalpaca|line_by_line|flickr8k]. You can also use a custom dataset parser in Python; refer to the Custom Dataset Guide.line_by_linetreats each line as a separate prompt and requires adataset_path.longalpacawill useitem['instruction']as the prompt. Ifdataset_pathis not specified, it will be automatically downloaded from modelscope.openqawill useitem['question']as the prompt. Ifdataset_pathis not specified, it will be automatically downloaded from modelscope.flickr8kwill construct image-text inputs, making it suitable for evaluating multimodal models; the dataset will be automatically downloaded from modelscope, and specifyingdataset_pathis not supported.
--dataset-path: Path to the dataset file, used in combination with the dataset.openqaandlongalpacado not need a specified dataset path and will be downloaded automatically;line_by_linerequires a local dataset file, which will be loaded line by line.
Model Settings#
--tokenizer-path: Optional, specify the path to tokenizer weights for calculating the number of input and output tokens, usually in the same directory as the model weights.--frequency-penalty: Frequency penalty value.--logprobs: Log probabilities.--max-tokens: Maximum number of tokens that can be generated.--min-tokens: Minimum number of tokens to be generated.--n-choices: Number of completion choices generated.--seed: Random seed, default is 42.--stop: Tokens that stop generation.--stop-token-ids: Set stop token IDs.--temperature: Sampling temperature.--top-p: Top_p sampling.
Data Storage#
--wandb-api-key: wandb API key, if set, metrics will be saved to wandb.--outputs-dirspecifies the output file path, with a default value of./outputs.