What are AI benchmarks?

Benchmarks are tools to measure effectiveness and/ore reliability of systems – for instance, of ML systems and IA systems.

REFERENCES

AI Benchmarking | EBU Technology & Innovation
tech.ebu.ch/groups/ai-benchmarking
S&P AI Benchmarks by Kensho
benchmarks.kensho.com/
What Makes a Good AI Benchmark? | Stanford HAI
hai.stanford.edu/policy/what-makes-a-good-ai-benchmark
AI Benchmarks Explained… DeepSeek vs OpenAI – YouTube
www.youtube.com/watch?v=gzTGXvAW11E
Gaming as an AI Benchmark: A Quick JUMP – YouTube
www.youtube.com/watch?v=gTa0p9q-rbY

BENCHMARKS AND BENCHMARK SYSTEMS

ARC Prize – What is ARC-AGI?
arcprize.org/arc-agi
AI-Benchmark
ai-benchmark.com/alpha
Geekbench AI – Cross-Platform AI Benchmark
www.geekbench.com/ai/
LiveBench
livebench.ai/#/
Comparison of AI Models across Intelligence, Performance, Price | Artificial Analysis
artificialanalysis.ai/models
AI Benchmarking Dashboard | Epoch AI
epoch.ai/data/ai-benchmarking-dashboard

BENCHMARKING HARDWARE

Another type of benchmarks are those designed to measure how well a software application can run on a – for instance – a mobile device.

BUT

Since the AI world is in constant (r)evolution, benchmarking can’t keep up….

Mind Readings: How to Benchmark and Evaluate Generative AI Models, Part 2 of 4 – YouTube
www.youtube.com/watch?v=AlqPZxNHz_Y
No sabemos qué miden los benchmarks de IA. Así que hemos hablado con el español que ha creado uno de los más difíciles
www.xataka.com/robotica-e-ia/no-sabemos-que-miden-exactamente-benchmarks-ia-asi-que-hemos-hablado-espanol-que-ha-creado-uno-dificiles