IQuiz#

Overview#

IQuiz is a Chinese benchmark for evaluating AI models on intelligence quotient (IQ) and emotional quotient (EQ) questions. It tests logical reasoning, pattern recognition, and social-emotional understanding through multiple-choice questions.

Task Description#

  • Task Type: Multiple-Choice Question Answering

  • Input: Question in Chinese with multiple choice options

  • Output: Selected answer with explanation (Chain-of-Thought)

  • Language: Chinese

Key Features#

  • Dual evaluation of IQ and EQ capabilities

  • Chinese-language cognitive assessment

  • Multiple difficulty levels

  • Requires explanation alongside answer selection

  • Tests logical reasoning and emotional understanding

Evaluation Notes#

  • Default configuration uses 0-shot evaluation

  • Primary metric: Accuracy

  • Subsets: IQ (logical reasoning) and EQ (emotional intelligence)

  • Uses Chinese Chain-of-Thought prompt template

  • Evaluates on test split

  • Metadata includes difficulty level information

Properties#

Property

Value

Benchmark Name

iquiz

Dataset ID

AI-ModelScope/IQuiz

Paper

N/A

Tags

Chinese, Knowledge, MCQ

Metrics

acc

Default Shots

0-shot

Evaluation Split

test

Data Statistics#

Metric

Value

Total Samples

120

Prompt Length (Mean)

248.31 chars

Prompt Length (Min/Max)

146 / 394 chars

Per-Subset Statistics:

Subset

Samples

Prompt Mean

Prompt Min

Prompt Max

IQ

40

194.5

146

323

EQ

80

275.21

219

394

Sample Example#

Subset: IQ

{
  "input": [
    {
      "id": "748f2700",
      "content": "回答下面的单项选择题,请选出其中的正确答案。你的回答的最后一行应该是这样的格式:\"答案:[LETTER]\"(不带引号),其中 [LETTER] 是 A,B,C,D 中的一个。请在回答前进行一步步思考。\n\n问题:天气预报说本周星期三会下雨,昨天果然下雨了,今天星期几?\n选项:\nA) 星期一\nB) 星期二\nC) 星期三\nD) 星期四\n"
    }
  ],
  "choices": [
    "星期一",
    "星期二",
    "星期三",
    "星期四"
  ],
  "target": "D",
  "id": 0,
  "group_id": 0,
  "metadata": {
    "level": 1
  }
}

Prompt Template#

Prompt Template:

回答下面的单项选择题,请选出其中的正确答案。你的回答的最后一行应该是这样的格式:"答案:[LETTER]"(不带引号),其中 [LETTER] 是 {letters} 中的一个。请在回答前进行一步步思考。

问题:{question}
选项:
{choices}

Usage#

Using CLI#

evalscope eval \
    --model YOUR_MODEL \
    --api-url OPENAI_API_COMPAT_URL \
    --api-key EMPTY_TOKEN \
    --datasets iquiz \
    --limit 10  # Remove this line for formal evaluation

Using Python#

from evalscope import run_task
from evalscope.config import TaskConfig

task_cfg = TaskConfig(
    model='YOUR_MODEL',
    api_url='OPENAI_API_COMPAT_URL',
    api_key='EMPTY_TOKEN',
    datasets=['iquiz'],
    dataset_args={
        'iquiz': {
            # subset_list: ['IQ', 'EQ']  # optional, evaluate specific subsets
        }
    },
    limit=10,  # Remove this line for formal evaluation
)

run_task(task_cfg=task_cfg)