Skip to content
Logo LogoEvalScope
Docs Blogs
⌘ K
Logo LogoEvalScope
Docs Blogs

🚀 Quick Start

  • Introduction
  • Installation
  • Basic Usage
  • Visualization
  • Parameters
  • Supported Datasets
  • ❓ FAQ

🔧 User Guides

  • Evaluation Backends
    • OpenCompass
    • VLMEvalKit
    • RAGEval
      • MTEB
      • CLIP Benchmark
      • RAGAS
  • Model Inference Stress Testing
    • Quick Start
    • Parameter Description
    • Examples
    • Speed Benchmark Testing
    • Custom Usage
  • AIGC Evaluation
    • Text-to-Image Evaluation
  • Arena Mode

🛠️ Advanced Tutorials

  • Mixed Data Evaluation
    • Defining the Data Mixing Schema
    • Sampling Data
    • Unified Evaluation
  • Custom Datasets
    • Large Language Model
    • Multimodal Large Model
    • Embedding Model
    • CLIP Model
  • Custom Model
  • 👍 Contribute Benchmark

🧰 Third-Party Tools

  • ToolBench
  • LongBench-Write

🧪 Benchmarking Results

  • Benchmarking
    • MMLU
  • Speed Benchmarking
    • QwQ-32B-Preview

📖 Best Practices

  • Best Practices for Evaluating the Qwen3 Model
  • Best Practices for Evaluating the QwQ Model
  • How Smart is Your AI? Full Assessment of IQ and EQ!
  • Evaluating the Inference Capability of R1 Models
  • Evaluating the Thinking Efficiency of Models
  • ms-swift Integration
  • Full-Chain LLM Training

🌟 Blogs

  • Welcome to the EvalScope Blogs!
    • RAG Evaluation Survey: Framework, Metrics, and Methods
EvalScope
/
Model Inference Stress Testing

Model Inference Stress Testing#

A stress testing tool for large language models that can be customized to support various dataset formats and different API protocol formats, with default support for the OpenAI API format.

  • Quick Start
    • Environment Preparation
    • Basic Usage
    • Visualizing Test Results
  • Parameter Description
    • Basic Settings
    • Network Configuration
    • Request Control
    • Prompt Settings
    • Dataset Configuration
    • Model Settings
    • Data Storage
  • Examples
    • Using Local Model Inference
    • Using prompt
    • Complex Requests
    • Using query-template
    • Using the Random Dataset
    • Using wandb to Record Test Results
    • Using swanlab to Record Test Results
    • Debugging Requests
  • Speed Benchmark Testing
    • Online API Inference
    • Local Transformer Inference
    • Local vLLM Inference
  • Custom Usage
    • Custom Result Analysis
    • Custom Request API
    • Custom Dataset
RAGAS
Quick Start

© 2022-2024, Alibaba ModelScope Built with Sphinx 8.2.3