The Eval Index / Agent Eval / #130
THUDM/AgentBench
by THUDM · Agent Eval · updated 4mo ago
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
53
momentum
3,492
stars
262
forks
#130
rank
chatgptgpt-4llmllm-agent
View on GitHub →