Benchmarks are tools to measure effectiveness and/ore reliability of systems – for instance, of ML systems and IA systems.
REFERENCES
- AI Benchmarking | EBU Technology & Innovation
tech.ebu.ch/groups/ai-benchmarking - S&P AI Benchmarks by Kensho
benchmarks.kensho.com/ - What Makes a Good AI Benchmark? | Stanford HAI
hai.stanford.edu/policy/what-makes-a-good-ai-benchmark - AI Benchmarks Explained… DeepSeek vs OpenAI – YouTube
www.youtube.com/watch?v=gzTGXvAW11E - Gaming as an AI Benchmark: A Quick JUMP – YouTube
www.youtube.com/watch?v=gTa0p9q-rbY
BENCHMARKS AND BENCHMARK SYSTEMS
- ARC Prize – What is ARC-AGI?
arcprize.org/arc-agi - AI-Benchmark
ai-benchmark.com/alpha - Geekbench AI – Cross-Platform AI Benchmark
www.geekbench.com/ai/ - LiveBench
livebench.ai/#/ - Comparison of AI Models across Intelligence, Performance, Price | Artificial Analysis
artificialanalysis.ai/models - AI Benchmarking Dashboard | Epoch AI
epoch.ai/data/ai-benchmarking-dashboard
BENCHMARKING HARDWARE
Another type of benchmarks are those designed to measure how well a software application can run on a – for instance – a mobile device.
- AI Benchmark – Aplicaciones en Google Play
play.google.com/store/apps/details?id=org.benchmark.demo
BUT
Since the AI world is in constant (r)evolution, benchmarking can’t keep up….
- Mind Readings: How to Benchmark and Evaluate Generative AI Models, Part 2 of 4 – YouTube
www.youtube.com/watch?v=AlqPZxNHz_Y