The Eval Index / Eval Frameworks / #238

UW-Madison-Lee-Lab/LLM-judge-reporting

by UW-Madison-Lee-Lab · Eval Frameworks · updated 6mo ago

A simple plug-in framework that corrects bias and computes confidence intervals in reporting LLM-as-a-judge evaluation, and an adaptive algorithm that efficiently allocates calibration samples to reduce uncertainty in estimates.

momentum

stars

forks

#238

rank

View on GitHub →

UW-Madison-Lee-Lab/LLM-judge-reporting

More in Eval Frameworks