Ethics & SafetyLast updated: April 2026

RLHF (Reinforcement Learning from Human Feedback)

A training technique that improves AI using human ratings of its responses.

In Plain English

RLHF is a training method where humans rate AI responses, and these ratings are used to improve the AI's behavior. Human trainers compare different AI outputs and choose which is better, more helpful, or safer. The AI learns from thousands of these comparisons to produce responses humans prefer. RLHF is key to making AI assistants helpful and safe.

💡Real-World Example

Most leading AI chatbots were trained with RLHF — human trainers rated thousands of responses to teach the AI which answers were most helpful and safe.

Related Terms

Fine-tuning Alignment

What did you think of our explanation?

←Red teamingPrevious View all terms Self-supervised learningNext→

Want to learn more about AI?

Explore our curated collection of AI news, tools, and guides — all explained in plain English.

Read Latest News Explore AI Tools