Safety, Alignment & Ethics
Red Teaming
Deliberately trying to find flaws or harmful behaviours in an AI before deployment.
Definition
Red teaming in AI means deliberately attempting to elicit harmful, incorrect, or dangerous outputs from a model to find its weaknesses before deployment. Red teams use adversarial prompts, jailbreaking techniques, and edge cases to stress-test AI systems. This proactive approach to finding failures is standard practice at leading AI labs and is increasingly required by enterprise AI governance frameworks and emerging regulation.
Related Terms
Jailbreaking
Attempts to bypass an AI's safety guidelines through clever prompting.
Prompt Injection
A type of attack where malicious instructions hidden in content hijack an AI's behaviour.
AI Safety
The field focused on preventing AI from causing harm — intentional or unintentional.
Guardrails
Rules or filters built into an AI system to prevent harmful or inappropriate outputs.
Heard enough terminology — ready to talk outcomes?
We translate AI concepts into measurable business results. No upfront fees — you pay only when independently verified results are delivered.
Disclaimer
This definition is provided for educational and informational purposes only. It represents a general explanation of a technical concept and does not constitute professional, technical, or investment advice. Artificial intelligence is a rapidly evolving field; terminology, techniques, and capabilities change frequently. Coaley Peak Ltd makes no warranty as to the accuracy, completeness, or currency of the information provided. Nothing on this page should be relied upon as the sole basis for commercial, technical, legal, or investment decisions without independent professional advice.
Document reference: ISO_webpage_knowledge-base_glossary_v1
Last modified: 29 March 2026
Knowledge Base·Safety, Alignment & Ethics·Red Teaming