Ethics & SafetyLast updated: April 2026

Red teaming

Having people deliberately try to break or trick an AI system to find weaknesses before bad actors do.

In Plain English

Red teaming is a security practice where authorized people act as adversaries, trying everything they can think of to make an AI system fail, produce harmful content, or behave in unintended ways. It's named after war games where one side plays the "enemy." The goal is to discover vulnerabilities in a controlled setting so developers can fix them before the system is released to the public. It's like having a professional food taster test your restaurant's dishes for contamination before they reach customers.

💡Real-World Example

Before releasing a new AI chatbot, a company hires a team to spend weeks trying to trick it into giving dangerous advice, spitting out offensive language, or revealing sensitive data. When they find a problem—say, the chatbot can be manipulated into ignoring its safety rules—the developers patch it. This happens before the public ever sees the system.

What did you think of our explanation?

←Recommendation SystemPrevious View all terms RLHF (Reinforcement Learning from Human Feedback)Next→

Want to learn more about AI?

Explore our curated collection of AI news, tools, and guides — all explained in plain English.

Read Latest News Explore AI Tools