The Eval Index / Benchmarks / #164
OpenGenerativeAI/llm-colosseum
by OpenGenerativeAI · Benchmarks · updated 1y ago
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
39
momentum
1,483
stars
180
forks
#164
rank
benchmarkgenaillmstreetfighterai
View on GitHub →