guardrails
Built-in safety rules that restrict what an AI system can do to prevent harmful, illegal, or unintended outputs.
In Plain English
Guardrails are constraints programmed into an AI system to keep it from producing dangerous, unethical, or misaligned outputs. They're like the safety features on a car—they don't stop the car from working, but they prevent obvious harms. A guardrail might prevent an AI from writing code for malware, generating instructions for illegal activities, or impersonating a real person. Guardrails are essential because AI systems are powerful tools that can be misused, and responsible developers build in checks to reduce that risk. They're implemented through training, monitoring, and explicit rules.
💡Real-World Example
A coding agent is asked to write malicious software that could steal passwords. Guardrails built into the system detect this request and refuse to generate the code, instead explaining why the task violates safety policies. Another example: a chatbot's guardrails prevent it from claiming to be a licensed doctor and offering medical diagnoses, even if someone asks directly.
Related Terms
What did you think of our explanation?
