The Eval Index / Reasoning / #209

stalkermustang/llm-bulls-and-cows-benchmark

by stalkermustang · Reasoning · updated 1y ago

A mini-framework for evaluating LLM performance on the Bulls and Cows number guessing game, supporting multiple LLM providers.

29
momentum
234
stars
1
forks
#209
rank
benchmarkbenchmarkingchatgptgamesllmopenaipythonreasoning
View on GitHub →