MTEB#

CMTEB Evaluation Dataset#

Name

Hub Link

Description

Type

Category

Number of Test Samples

T2Retrieval

C-MTEB/T2Retrieval

T2Ranking: A large-scale Chinese paragraph ranking benchmark

Retrieval

s2p

24,832

MMarcoRetrieval

C-MTEB/MMarcoRetrieval

mMARCO is the multilingual version of the MS MARCO paragraph ranking dataset

Retrieval

s2p

7,437

DuRetrieval

C-MTEB/DuRetrieval

A large-scale Chinese web search engine paragraph retrieval benchmark

Retrieval

s2p

4,000

CovidRetrieval

C-MTEB/CovidRetrieval

COVID-19 news articles

Retrieval

s2p

949

CmedqaRetrieval

C-MTEB/CmedqaRetrieval

Online medical consultation texts

Retrieval

s2p

3,999

EcomRetrieval

C-MTEB/EcomRetrieval

Paragraph retrieval dataset collected from Alibaba e-commerce search engine systems

Retrieval

s2p

1,000

MedicalRetrieval

C-MTEB/MedicalRetrieval

Paragraph retrieval dataset collected from Alibaba medical search engine systems

Retrieval

s2p

1,000

VideoRetrieval

C-MTEB/VideoRetrieval

Paragraph retrieval dataset collected from Alibaba video search engine systems

Retrieval

s2p

1,000

T2Reranking

C-MTEB/T2Reranking

T2Ranking: A large-scale Chinese paragraph ranking benchmark

Re-ranking

s2p

24,382

MMarcoReranking

C-MTEB/MMarco-reranking

mMARCO is the multilingual version of the MS MARCO paragraph ranking dataset

Re-ranking

s2p

7,437

CMedQAv1

C-MTEB/CMedQAv1-reranking

Chinese community medical Q&A

Re-ranking

s2p

2,000

CMedQAv2

C-MTEB/CMedQAv2-reranking

Chinese community medical Q&A

Re-ranking

s2p

4,000

Ocnli

C-MTEB/OCNLI

Original Chinese natural language inference dataset

Pair Classification

s2s

3,000

Cmnli

C-MTEB/CMNLI

Chinese multi-class natural language inference

Pair Classification

s2s

139,000

CLSClusteringS2S

C-MTEB/CLSClusteringS2S

Clustering titles from the CLS dataset. Clustering based on 13 sets of main categories.

Clustering

s2s

10,000

CLSClusteringP2P

C-MTEB/CLSClusteringP2P

Clustering titles + abstracts from the CLS dataset. Clustering based on 13 sets of main categories.

Clustering

p2p

10,000

ThuNewsClusteringS2S

C-MTEB/ThuNewsClusteringS2S

Clustering titles from the THUCNews dataset

Clustering

s2s

10,000

ThuNewsClusteringP2P

C-MTEB/ThuNewsClusteringP2P

Clustering titles + abstracts from the THUCNews dataset

Clustering

p2p

10,000

ATEC

C-MTEB/ATEC

ATEC NLP Sentence Pair Similarity Competition

STS

s2s

20,000

BQ

C-MTEB/BQ

Banking Question Semantic Similarity

STS

s2s

10,000

LCQMC

C-MTEB/LCQMC

Large-scale Chinese Question Matching Corpus

STS

s2s

12,500

PAWSX

C-MTEB/PAWSX

Translated PAWS evaluation pairs

STS

s2s

2,000

STSB

C-MTEB/STSB

Translated STS-B into Chinese

STS

s2s

1,360

AFQMC

C-MTEB/AFQMC

Ant Financial Question Matching Corpus

STS

s2s

3,861

QBQTC

C-MTEB/QBQTC

QQ Browser Query Title Corpus

STS

s2s

5,000

TNews

C-MTEB/TNews-classification

News Short Text Classification

Classification

s2s

10,000

IFlyTek

C-MTEB/IFlyTek-classification

Long Text Classification of Application Descriptions

Classification

s2s

2,600

Waimai

C-MTEB/waimai-classification

Sentiment Analysis of User Reviews on Food Delivery Platforms

Classification

s2s

1,000

OnlineShopping

C-MTEB/OnlineShopping-classification

Sentiment Analysis of User Reviews on Online Shopping Websites

Classification

s2s

1,000

MultilingualSentiment

C-MTEB/MultilingualSentiment-classification

A set of multilingual sentiment datasets grouped into three categories: positive, neutral, negative

Classification

s2s

3,000

JDReview

C-MTEB/JDReview-classification

Reviews of iPhone

Classification

s2s

533

For retrieval tasks, a sample of 100,000 candidates (including the ground truth) is drawn from the entire corpus to reduce inference costs.

MTEB Evaluation Dataset#

See also

See also: MTEB Related Tasks