The Eval Index / Red Teaming & Safety / #3

promptfoo/promptfoo

by promptfoo · Red Teaming & Safety · updated today

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

momentum

22,162

stars

1,972

forks

rank

cici-cdcicdevaluationevaluation-frameworkllmllm-evalllm-evaluationllm-evaluation-frameworkllmopspentestingprompt-engineering

View on GitHub →

promptfoo/promptfoo

More in Red Teaming & Safety