← Leaderboard

The Eval Index / Benchmarks / #136

harbor-framework/terminal-bench

by harbor-framework · Benchmarks · updated 4mo ago

A benchmark for LLMs on complicated tasks in the terminal

48

momentum

2,353

stars

541

forks

#136

rank

View on GitHub →