The Eval Index / Benchmarks / #136

harbor-framework/terminal-bench

by harbor-framework · Benchmarks · updated 4mo ago

A benchmark for LLMs on complicated tasks in the terminal

48
momentum
2,353
stars
541
forks
#136
rank
View on GitHub →