Supported Benchmarks#

EvalScope supports a variety of datasets for evaluating different types of models, including language models, AIGC models, and other models. Below is a list of the supported datasets categorized by their respective model types.

Tip

If the dataset you need is not on the list, you may submit an issue, and we will support it as soon as possible. Alternatively, you can refer to the Benchmark Addition Guide to add datasets by yourself and submit a PR. Contributions are welcome.

You can also use other tools supported by this framework for evaluation, such as OpenCompass for language model evaluation, or VLMEvalKit for multimodal model evaluation.