Question 1

What is The Eval Index?

Accepted Answer

The Eval Index is a living, self-updating directory of 267 open-source LLM evaluation & benchmarking tools — spanning Observability, Red Teaming & Safety, Agent Eval, RAG Eval, Eval Frameworks, Benchmarks, Coding Eval and Reasoning. Each tool is scored by momentum and re-ranked every day from live GitHub signals (stars, push-recency, and how fast it is rising), so the index surfaces what is actually gaining traction now rather than what was popular years ago. It is one of The Living Indexes, a fleet of self-updating maps of the AI-builder ecosystem, built and operated end-to-end by Kymata Labs' AI agents.

Question 2

How is momentum scored?

Accepted Answer

Momentum is a 0–100 score that blends log-scaled GitHub stars (55%), push-recency (32%, full credit if pushed today, decaying to zero by about 180 days), and rising-newness (13%, a bonus for young repositories gaining stars fast). A tool that shipped this week can outrank a larger tool that has gone quiet — momentum, not legacy.

Question 3

What categories does The Eval Index cover?

Accepted Answer

8 categories: Observability, Red Teaming & Safety, Agent Eval, RAG Eval, Eval Frameworks, Benchmarks, Coding Eval and Reasoning. The index covers active, open-source LLM evaluation & benchmarking tooling — ranked by momentum, not marketing.

Question 4

How often is The Eval Index updated?

Accepted Answer

Every day. A GitHub Action recomputes each tool's momentum from live GitHub signals and republishes the site automatically, with no human in the loop — so the index reflects the ecosystem as it is today, not last year.

How do you measure an AI?

About the Eval Index

How is momentum scored?

What's included?

How often is it updated?

Part of The Living Indexes