快速开始#
环境准备#
# 安装额外依赖
pip install evalscope[perf] -U
git clone https://github.com/modelscope/evalscope.git
cd evalscope
pip install -e '.[perf]'
基本使用#
可以使用以下两种方式启动模型推理性能压测工具:
evalscope perf \
--url "http://127.0.0.1:8000/v1/chat/completions" \
--parallel 1 \
--model qwen2.5 \
--number 15 \
--api openai \
--dataset openqa \
--stream
from evalscope.perf.main import run_perf_benchmark
task_cfg = {"url": "http://127.0.0.1:8000/v1/chat/completions",
"parallel": 1,
"model": "qwen2.5",
"number": 15,
"api": "openai",
"dataset": "openqa",
"stream": True}
run_perf_benchmark(task_cfg)
参数说明:
url: 请求的URL地址parallel: 并行请求的任务数量model: 使用的模型名称number: 请求数量api: 使用的API服务dataset: 数据集名称stream: 是否启用流式处理
输出结果#
Benchmarking summary:
+----------------------------------------------+------------------------------------------------+
| key | Value |
+==============================================+================================================+
| Time taken for tests (senconds) | 7.539 |
+----------------------------------------------+------------------------------------------------+
| Number of concurrency | 1 |
+----------------------------------------------+------------------------------------------------+
| Total requests | 15 |
+----------------------------------------------+------------------------------------------------+
| Succeed requests | 15 |
+----------------------------------------------+------------------------------------------------+
| Failed requests | 0 |
+----------------------------------------------+------------------------------------------------+
| Average QPS | 1.99 |
+----------------------------------------------+------------------------------------------------+
| Average latency | 0.492 |
+----------------------------------------------+------------------------------------------------+
| Average time to first token | 0.026 |
+----------------------------------------------+------------------------------------------------+
| Throughput(average output tokens per second) | 334.006 |
+----------------------------------------------+------------------------------------------------+
| Average time per output token | 0.00299 |
+----------------------------------------------+------------------------------------------------+
| Average package per request | 167.867 |
+----------------------------------------------+------------------------------------------------+
| Average package latency | 0.003 |
+----------------------------------------------+------------------------------------------------+
| Average input tokens per request | 40.133 |
+----------------------------------------------+------------------------------------------------+
| Average output tokens per request | 167.867 |
+----------------------------------------------+------------------------------------------------+
| Expected number of requests | 15 |
+----------------------------------------------+------------------------------------------------+
| Result DB path | ./outputs/qwen2.5_benchmark_20241107_201413.db |
+----------------------------------------------+------------------------------------------------+
Percentile results:
+------------+---------------------+---------+
| Percentile | First Chunk Latency | Latency |
+------------+---------------------+---------+
| 10% | 0.0178 | 0.1577 |
| 25% | 0.0183 | 0.2358 |
| 50% | 0.0199 | 0.4311 |
| 66% | 0.0218 | 0.6317 |
| 75% | 0.0429 | 0.7121 |
| 80% | 0.0432 | 0.7957 |
| 90% | 0.0432 | 0.9153 |
| 95% | 0.0433 | 0.9897 |
| 98% | 0.0433 | 0.9897 |
| 99% | 0.0433 | 0.9897 |
+------------+---------------------+---------+
指标说明#
指标 |
说明 |
数值 |
|---|---|---|
Total requests |
总请求数 |
15 |
Succeeded requests |
成功请求数 |
15 |
Failed requests |
失败请求数 |
0 |
Average QPS |
每秒平均请求数 |
1.99 |
Average latency |
所有请求的平均延迟 |
0.492 |
Throughput(average output tokens per second) |
每秒输出token数量 |
334.006 |
Average time to first token |
首token的平均延时 |
0.026 |
Average input tokens per request |
每个请求的平均输入token数量 |
40.133 |
Average output tokens per request |
每个请求的平均输出token数量 |
167.867 |
Average time per output token |
输出每个token的平均时间 |
0.00299 |
Average package per request |
每个请求的平均包数 |
167.867 |
Average package latency |
每个包的平均延迟 |
0.003 |
Percentile of time to first token (p10, ..., p99) |
首token延时百分位 |
|
Percentile of request latency (p10, ..., p99) |
请求延迟的百分位 |