lmarena/arena-hard-auto

16

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mist

Benchmarks

mom79

19

Andyyyy64/whichllm

Find the local LLM that actually runs and performs best on your hardware. Ranked by real,

Benchmarks

mom79

30

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Benchmarks

mom76

31

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 8

Benchmarks

mom76

More in Benchmarks