EVAL
/
INDEX
← Leaderboard
◐
The Eval Index
/ Benchmarks / #173
lmarena/arena-hard-auto
by lmarena · Benchmarks · updated 11mo ago
Arena-Hard-Auto: An automatic LLM benchmark.
37
momentum
1,032
stars
154
forks
#173
rank
View on GitHub →
More in Benchmarks
16
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mist
Benchmarks
mom
79
19
Andyyyy64/whichllm
Find the local LLM that actually runs and performs best on your hardware. Ranked by real,
Benchmarks
mom
79
30
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Benchmarks
mom
76
31
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 8
Benchmarks
mom
76