The Eval Index / Agent Eval / #177

QuesmaOrg/BinaryAudit

by QuesmaOrg · Agent Eval · updated 3mo ago

An open-source benchmark for evaluating AI agents' ability to find backdoors hidden in compiled binaries.

37
momentum
92
stars
5
forks
#177
rank
aibenchmarkbinary-analysiscybersecurityllm-evalreverse-engineering
View on GitHub →