The Eval Index / Agent Eval / #130

THUDM/AgentBench

by THUDM · Agent Eval · updated 4mo ago

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

53
momentum
3,492
stars
262
forks
#130
rank
chatgptgpt-4llmllm-agent
View on GitHub →