Artificiële Intelligentie / Inteligencia Artificial

MMLU – Wikipedia en.wikipedia.org/wiki/MMLU Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of language models. It consists of about 16,000 multiple-choice questions spanning 57 academic subjects including mathematics, philosophy, law, and medicine. It is one of the most commonly used benchmarks for comparing the capabilities

Seguir leyendo

Etiqueta: benchmarking

MLPerf : un outil de référence pour tester les IA

Measuring Massive Multitask Language Understanding (MMLU)