Skip to content

⌘ K

🚀 Quick Start

Introduction
Installation
Quick Start
Visualization
Parameters
Supported Benchmarks
❓ FAQ

🔧 Tutorials

Evaluation Backends
- OpenCompass
- VLMEvalKit
- RAGEval
Model Inference Stress Testing
AIGC Evaluation
- Text-to-Image Evaluation
- Image Editing Evaluation
Arena Mode
Sandbox Environment Usage
EvalScope Service Deployment

🛠️ Advanced Tutorials

Building an Evaluation Index
Custom Datasets
Custom Model Evaluation
👍 Contribute Benchmark

🧰 Extended Benchmarks

Extended Benchmarks

📖 Best Practices

Best Practices

🧪 Benchmark Results

Benchmarking
- MMLU
Speed Benchmarking
- QwQ-32B-Preview

🌟 Blog

Welcome to the EvalScope Blogs!
- RAG Evaluation Survey: Framework, Metrics, and Methods

/

Supported Benchmarks

/

Other Datasets

Other Datasets#

OpenCompass
VLMEvalKit Backend
- Image Understanding Dataset
- Video Understanding Dataset
MTEB
- CMTEB Evaluation Dataset
- MTEB Evaluation Dataset
CLIP-Benchmark

AIGC Benchmarks

© 2022-2024, Alibaba ModelScope Built with Sphinx 8.2.3