The Eval Index / Benchmarks / #249

multinear/multinear

by multinear · Benchmarks · updated 9mo ago

Develop reliable AI apps

20
momentum
45
stars
1
forks
#249
rank
evaluationllmllm-evalllm-evaluationllm-evaluation-frameworkllmsllms-benchmarkingreliability
View on GitHub →