快速开始#
环境准备#
# 安装额外依赖
pip install evalscope[perf] -U
git clone https://github.com/modelscope/evalscope.git
cd evalscope
pip install -e '.[perf]'
基本使用#
可以使用以下两种方式启动模型推理性能压测工具:
evalscope perf \
--url "http://127.0.0.1:8000/v1/chat/completions" \
--parallel 1 \
--model qwen2.5 \
--number 15 \
--api openai \
--dataset openqa \
--stream
from evalscope.perf.main import run_perf_benchmark
task_cfg = {"url": "http://127.0.0.1:8000/v1/chat/completions",
"parallel": 1,
"model": "qwen2.5",
"number": 15,
"api": "openai",
"dataset": "openqa",
"stream": True}
run_perf_benchmark(task_cfg)
参数说明:
url
: 请求的URL地址parallel
: 并行请求的任务数量model
: 使用的模型名称number
: 请求数量api
: 使用的API服务dataset
: 数据集名称stream
: 是否启用流式处理
输出结果#
Benchmarking summary:
+-----------------------------------+-----------------------------------------------------+
| Key | Value |
+===================================+=====================================================+
| Time taken for tests (s) | 10.739 |
+-----------------------------------+-----------------------------------------------------+
| Number of concurrency | 1 |
+-----------------------------------+-----------------------------------------------------+
| Total requests | 15 |
+-----------------------------------+-----------------------------------------------------+
| Succeed requests | 15 |
+-----------------------------------+-----------------------------------------------------+
| Failed requests | 0 |
+-----------------------------------+-----------------------------------------------------+
| Throughput(average tokens/s) | 324.059 |
+-----------------------------------+-----------------------------------------------------+
| Average QPS | 1.397 |
+-----------------------------------+-----------------------------------------------------+
| Average latency (s) | 0.696 |
+-----------------------------------+-----------------------------------------------------+
| Average time to first token (s) | 0.029 |
+-----------------------------------+-----------------------------------------------------+
| Average time per output token (s) | 0.00309 |
+-----------------------------------+-----------------------------------------------------+
| Average input tokens per request | 50.133 |
+-----------------------------------+-----------------------------------------------------+
| Average output tokens per request | 232.0 |
+-----------------------------------+-----------------------------------------------------+
| Average package latency (s) | 0.003 |
+-----------------------------------+-----------------------------------------------------+
| Average package per request | 232.0 |
+-----------------------------------+-----------------------------------------------------+
| Expected number of requests | 15 |
+-----------------------------------+-----------------------------------------------------+
| Result DB path | ./outputs/20241216_194204/qwen2.5/benchmark_data.db |
+-----------------------------------+-----------------------------------------------------+
Percentile results:
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| Percentile | TTFT (s) | TPOT (s) | Latency (s) | Input tokens | Output tokens | Throughput(tokens/s) |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
| 10% | 0.0202 | 0.0027 | 0.1846 | 41 | 50 | 270.8324 |
| 25% | 0.0209 | 0.0028 | 0.2861 | 44 | 83 | 290.0714 |
| 50% | 0.0233 | 0.0028 | 0.7293 | 49 | 250 | 335.644 |
| 66% | 0.0267 | 0.0029 | 0.9052 | 50 | 308 | 340.2603 |
| 75% | 0.0437 | 0.0029 | 0.9683 | 53 | 325 | 341.947 |
| 80% | 0.0438 | 0.003 | 1.0799 | 58 | 376 | 342.7985 |
| 90% | 0.0439 | 0.0032 | 1.2474 | 62 | 424 | 345.5268 |
| 95% | 0.0463 | 0.0033 | 1.3038 | 66 | 431 | 348.1648 |
| 98% | 0.0463 | 0.0035 | 1.3038 | 66 | 431 | 348.1648 |
| 99% | 0.0463 | 0.0037 | 1.3038 | 66 | 431 | 348.1648 |
+------------+----------+----------+-------------+--------------+---------------+----------------------+
指标说明#
指标 |
说明 |
---|---|
Time taken for tests (s) |
测试所用的时间(秒) |
Number of concurrency |
并发数量 |
Total requests |
总请求数 |
Succeed requests |
成功的请求数 |
Failed requests |
失败的请求数 |
Throughput(average tokens/s) |
吞吐量(平均每秒处理的token数) |
Average QPS |
平均每秒请求数(Queries Per Second) |
Average latency (s) |
平均延迟时间(秒) |
Average time to first token (s) |
平均首次token时间(秒) |
Average time per output token (s) |
平均每个输出token的时间(秒) |
Average input tokens per request |
每个请求的平均输入token数 |
Average output tokens per request |
每个请求的平均输出token数 |
Average package latency (s) |
平均包延迟时间(秒) |
Average package per request |
每个请求的平均包数 |
Expected number of requests |
预期的请求数 |
Result DB path |
结果数据库路径 |
Percentile |
数据被分为100个相等部分,第n百分位表示n%的数据点在此值之下 |
TTFT (s) |
Time to First Token,首次生成token的时间 |
TPOT (s) |
Time Per Output Token,生成每个输出token的时间 |
Latency (s) |
延迟时间,指请求到响应之间的时间 |
Input tokens |
输入的token数量 |
Output tokens |
输出的token数量 |
Throughput (tokens/s) |
吞吐量,指每秒处理token的数量 |