← Leaderboard

The Eval Index / Agent Eval / #130

THUDM/AgentBench

by THUDM · Agent Eval · updated 4mo ago

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

53

momentum

3,492

stars

262

forks

#130

rank

chatgptgpt-4llmllm-agent

View on GitHub →