The Eval Index / Benchmarks / #164

OpenGenerativeAI/llm-colosseum

by OpenGenerativeAI · Benchmarks · updated 1y ago

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

39
momentum
1,483
stars
180
forks
#164
rank
benchmarkgenaillmstreetfighterai
View on GitHub →