The Eval Index / Eval Frameworks / #238
UW-Madison-Lee-Lab/LLM-judge-reporting
by UW-Madison-Lee-Lab · Eval Frameworks · updated 6mo ago
A simple plug-in framework that corrects bias and computes confidence intervals in reporting LLM-as-a-judge evaluation, and an adaptive algorithm that efficiently allocates calibration samples to reduce uncertainty in estimates.
23
momentum
78
stars
4
forks
#238
rank