The Eval Index / Eval Frameworks / #149

ellydee/acceptance-bench

by ellydee · Eval Frameworks · updated 2mo ago

A robust LLM evaluation framework measuring acceptance vs refusal across difficulty levels. Features multi-prompt variation testing, temperature sweeping, and LLM-as-judge evaluation.

44
momentum
73
stars
1
forks
#149
rank
View on GitHub →