Los benchmarks son sistemas de medición para poder clasificar cosas. Ahora lo que nos interesa, son los benchmarks de IA. Es complicado, porque las evoluciones son importantes y a veces causan terremotos y deslices de terreno.
Unas referencias:
- About AI Benchmarks – AI-for-Education.org
ai-for-education.org/about-ai-benchmarks/ - How to Build AI Benchmarks That Evolve | Label Studio
labelstud.io/blog/how-to-build-ai-benchmarks-that-evolve-with-your-models/ - What Makes a Good AI Benchmark? | Stanford HAI
hai.stanford.edu/policy/what-makes-a-good-ai-benchmark - AI Benchmarking | EBU Technology & Innovation
tech.ebu.ch/groups/ai-benchmarking - The Race to Measure Machine Minds: Understanding AI Benchmarks
www.sandgarden.com/learn/benchmarks - Engineering impact analysis | DX
getdx.com/ai-impact-analysis/ - AI Benchmarks: How to Measure
gluo.be/ai-benchmarks/ - AI Benchmarks: How to measure real progress in artificial intelligence
toloka.ai/blog/ai-benchmarks-how-to-measure-real-progress-in-artificial-intelligence/ - AI Benchmarks for Education – AI-for-Education.org
ai-for-education.org/ai-benchmarks-for-education/ - Evidently AI – AI Evaluation & LLM Observability Platform
www.evidentlyai.com/ - ARC Prize – What is ARC-AGI?
arcprize.org/arc-agi
Y ahora, los benchmarks
- Comparison of AI Models across Intelligence, Performance, Price | Artificial Analysis
artificialanalysis.ai/models - AI Model Benchmarks Jan 2026 | Compare GPT-5, Claude 4.5, Gemini 2.5, Grok 4 | LM Council
lmcouncil.ai/benchmarks - Data on AI Benchmarking | Epoch AI
epoch.ai/benchmarks - LiveBench
livebench.ai/#/ - LLM Leaderboard 2025
www.vellum.ai/llm-leaderboard?utm_source=google&utm_medium=organic