The Eval Index / Reasoning / #209
stalkermustang/llm-bulls-and-cows-benchmark
by stalkermustang · Reasoning · updated 1y ago
A mini-framework for evaluating LLM performance on the Bulls and Cows number guessing game, supporting multiple LLM providers.
29
momentum
234
stars
1
forks
#209
rank
benchmarkbenchmarkingchatgptgamesllmopenaipythonreasoning
View on GitHub →