The Eval Index / Agent Eval / #133
HumphreySun98/repoagentbench
by HumphreySun98 · Agent Eval · updated 1mo ago
SWE-bench for your codebase — mine your merged PRs into local, contamination-free coding-agent benchmarks. Adapters: claude-code, aider (Opus 4.7 / GPT-5.5 / Sonnet 4.6 / Gemini 3.1 Pro).
51
momentum
32
stars
0
forks
#133
rank
agent-evalsai-agentsaiderbenchmarkclaude-opus-4-7coding-agentsdeveloper-toolsgemini-3-1-progpt-5-5llm-evalswe-bench
View on GitHub →