The Eval Index / Agent Eval / #29

PaperGuru-AI/PaperGuru-Benchmark

by PaperGuru-AI · Agent Eval · updated 4d ago

Lifecycle-Aware Memory for long-horizon LLM agents — 66.05% on PaperBench, 94.66% on SurveyBench, 10 peer-reviewed acceptances at FSE/ICML/TOSEM/AEI/ICoGB

77
momentum
970
stars
139
forks
#29
rank
View on GitHub →