MLPerf : un outil de référence pour tester les IA – Le Monde Informatique www.lemondeinformatique.fr/actualites/lire-mlperf-un-outil-de-reference-pour-tester-les-ia-95888.html
Seguir leyendoEtiqueta: benchmarking
Measuring Massive Multitask Language Understanding (MMLU)
MMLU – Wikipedia en.wikipedia.org/wiki/MMLU Measuring Massive Multitask Language Understanding (MMLU) is a benchmark for evaluating the capabilities of language models. It consists of about 16,000 multiple-choice questions spanning 57 academic subjects including mathematics, philosophy, law, and medicine. It is one of the most commonly used benchmarks for comparing the capabilities
Seguir leyendo