The Eval Index / Reasoning / #265
IAAR-Shanghai/GuessArena
by IAAR-Shanghai · Reasoning · updated 7mo ago
[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
13
momentum
10
stars
1
forks
#265
rank
benchmarkchatgptdeepseekdomain-specific-evalevaluation-frameworkgamearenaguessarenaknowledge-evaluationlarge-language-modelsllm-evalopenaiqwen
View on GitHub →