The Eval Index / Red Teaming & Safety / #3

promptfoo/promptfoo

by promptfoo · Red Teaming & Safety · updated today

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

86
momentum
22,162
stars
1,972
forks
#3
rank
cici-cdcicdevaluationevaluation-frameworkllmllm-evalllm-evaluationllm-evaluation-frameworkllmopspentestingprompt-engineering
View on GitHub →