AI Benchmarks

Los benchmarks son sistemas de medición para poder clasificar cosas. Ahora lo que nos interesa, son los benchmarks de IA. Es complicado, porque las evoluciones son importantes y a veces causan terremotos y deslices de terreno.

Unas referencias:

About AI Benchmarks – AI-for-Education.org
ai-for-education.org/about-ai-benchmarks/
How to Build AI Benchmarks That Evolve | Label Studio
labelstud.io/blog/how-to-build-ai-benchmarks-that-evolve-with-your-models/
What Makes a Good AI Benchmark? | Stanford HAI
hai.stanford.edu/policy/what-makes-a-good-ai-benchmark
AI Benchmarking | EBU Technology & Innovation
tech.ebu.ch/groups/ai-benchmarking
The Race to Measure Machine Minds: Understanding AI Benchmarks
www.sandgarden.com/learn/benchmarks
Engineering impact analysis | DX
getdx.com/ai-impact-analysis/
AI Benchmarks: How to Measure
gluo.be/ai-benchmarks/
AI Benchmarks: How to measure real progress in artificial intelligence
toloka.ai/blog/ai-benchmarks-how-to-measure-real-progress-in-artificial-intelligence/
AI Benchmarks for Education – AI-for-Education.org
ai-for-education.org/ai-benchmarks-for-education/
Evidently AI – AI Evaluation & LLM Observability Platform
www.evidentlyai.com/
ARC Prize – What is ARC-AGI?
arcprize.org/arc-agi
AI benchmark numbers are meaningless — here’s what to look for instead
www.makeuseof.com/ai-benchmark-numbers-are-meaningless-heres-what-to-look-for-instead/

Y ahora, los benchmarks

Comparison of AI Models across Intelligence, Performance, Price | Artificial Analysis
artificialanalysis.ai/models
AI Model Benchmarks Jan 2026 | Compare GPT-5, Claude 4.5, Gemini 2.5, Grok 4 | LM Council
lmcouncil.ai/benchmarks
Data on AI Benchmarking | Epoch AI
epoch.ai/benchmarks
LiveBench
livebench.ai/#/
LLM Leaderboard 2025
www.vellum.ai/llm-leaderboard

Mar 17, 2026 @ 6:15 pm