The Eval Index / Reasoning / #265

IAAR-Shanghai/GuessArena

by IAAR-Shanghai · Reasoning · updated 7mo ago

[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning

13
momentum
10
stars
1
forks
#265
rank
benchmarkchatgptdeepseekdomain-specific-evalevaluation-frameworkgamearenaguessarenaknowledge-evaluationlarge-language-modelsllm-evalopenaiqwen
View on GitHub →