MMLU – Wikipedia
en.wikipedia.org/wiki/MMLU
Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of language models. It consists of about 16,000 multiple-choice questions spanning 57 academic subjects including mathematics, philosophy, law, and medicine. It is one of the most commonly used benchmarks for comparing the capabilities of large language models
MMLU Dataset | Papers With Code
paperswithcode.com/dataset/mmlu