Model Inference Stress Testing#
A stress testing tool for large language models that can be customized to support various dataset formats and different API protocol formats, with default support for the OpenAI API format.
- Quick Start
- Parameter
- Examples
- SLA Auto-Tuning
- Speed Benchmark Testing
- vLLM Bench vs Evalscope Perf Load Testing Comparison
- TL;DR: Quick Comparison Recipe
- Environment and Prerequisites
- Unified Server Configuration
- Parameter Alignment Guide (Key Mappings)
- Consistency Validation: Minimum Example (1 Concurrent / 1 Request)
- Full Load Test: 50 Concurrency / 1000 Requests
- Metric Definitions and Naming Correspondence
- Common Sources of Discrepancies and Troubleshooting Suggestions
- Custom Usage