The Eval Index / Eval Frameworks / #238

UW-Madison-Lee-Lab/LLM-judge-reporting

by UW-Madison-Lee-Lab · Eval Frameworks · updated 6mo ago

A simple plug-in framework that corrects bias and computes confidence intervals in reporting LLM-as-a-judge evaluation, and an adaptive algorithm that efficiently allocates calibration samples to reduce uncertainty in estimates.

23
momentum
78
stars
4
forks
#238
rank
View on GitHub →