Measuring Massive Multitask Language Understanding (MMLU)

MMLU – Wikipedia
en.wikipedia.org/wiki/MMLU

Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of language models. It consists of about 16,000 multiple-choice questions spanning 57 academic subjects including mathematics, philosophy, law, and medicine. It is one of the most commonly used benchmarks for comparing the capabilities of large language models

MMLU Dataset | Papers With Code
paperswithcode.com/dataset/mmlu