The Eval Index / Benchmarks / #173

lmarena/arena-hard-auto

by lmarena · Benchmarks · updated 11mo ago

Arena-Hard-Auto: An automatic LLM benchmark.

37
momentum
1,032
stars
154
forks
#173
rank
View on GitHub →