The Eval Index / Agent Eval / #177
QuesmaOrg/BinaryAudit
by QuesmaOrg · Agent Eval · updated 3mo ago
An open-source benchmark for evaluating AI agents' ability to find backdoors hidden in compiled binaries.
37
momentum
92
stars
5
forks
#177
rank
aibenchmarkbinary-analysiscybersecurityllm-evalreverse-engineering
View on GitHub →