Ethics & SafetyLast updated: April 2026
RLHF (Reinforcement Learning from Human Feedback)
A training technique that improves AI using human ratings of its responses.
In Plain English
RLHF is a training method where humans rate AI responses, and these ratings are used to improve the AI's behavior. Human trainers compare different AI outputs and choose which is better, more helpful, or safer. The AI learns from thousands of these comparisons to produce responses humans prefer. RLHF is key to making AI assistants helpful and safe.
💡Real-World Example
Most leading AI chatbots were trained with RLHF — human trainers rated thousands of responses to teach the AI which answers were most helpful and safe.
Related Terms
What did you think of our explanation?
