The Eval Index / Agent Eval / #74
suyoumo/ClawProBench
by suyoumo · Agent Eval · updated 5d ago
ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
66
momentum
704
stars
51
forks
#74
rank
agentbenchmarkevaluationharnessleaderboardllmopenclaw
View on GitHub →