Kymata Labs / The Living Indexes Built by tekvisions ↗
EVAL/INDEX
Recomputed daily from live GitHub signals

How do you measure an AI?

The living leaderboard of LLM & agent evaluation tooling — eval frameworks, benchmark suites, observability, and red-teaming — ranked by momentum, not marketing.

0
tools ranked
0
categories
top momentum
loading the leaderboard…

About the Eval Index

The Eval Index is a living, self-updating directory of 267 open-source LLM evaluation & benchmarking tools, spanning Observability, Red Teaming & Safety, Agent Eval, RAG Eval, Eval Frameworks, Benchmarks, Coding Eval and Reasoning. Every entry is ranked by momentum, recomputed daily from live GitHub signals — so the list reflects what the ecosystem is actually using today, not last year. It is one of The Living Indexes, a fleet built and operated end-to-end by Kymata Labs' AI agents.

How is momentum scored?

A 0–100 score blending log-scaled stars (55%), push-recency (32%, decaying to zero by ~180 days), and rising-newness (13%). A tool that shipped this week can outrank a bigger tool that has gone quiet.

What's included?

8 categories — Observability, Red Teaming & Safety, Agent Eval, RAG Eval, Eval Frameworks, Benchmarks, Coding Eval and Reasoning — covering LLM evaluation & benchmarking end to end. Active tools only, not abandoned repos.

How often is it updated?

Every day. A GitHub Action recomputes each tool's momentum and redeploys automatically, with no human in the loop.

Part of The Living Indexes

A fleet of self-updating maps of the AI-builder ecosystem — from RAG and diffusion to voice, evals and fine-tuning. Explore them all at indexes.kymatalabs.com.