Training & Fine-tuning

Reward Model

An AI trained to score outputs, used to guide RLHF training.

Definition

During RLHF training, it would be impractical to have humans rate every single response the AI generates. Instead, a separate model — the reward model — is trained on human preference data to automatically predict how humans would rate a given response. This reward model then provides feedback signals during RLHF training without needing constant human involvement. The quality of the reward model significantly influences the final model's behaviour.

Heard enough terminology — ready to talk outcomes?

We translate AI concepts into measurable business results. No upfront fees — you pay only when independently verified results are delivered.

← Back to glossary

Disclaimer

This definition is provided for educational and informational purposes only. It represents a general explanation of a technical concept and does not constitute professional, technical, or investment advice. Artificial intelligence is a rapidly evolving field; terminology, techniques, and capabilities change frequently. Coaley Peak Ltd makes no warranty as to the accuracy, completeness, or currency of the information provided. Nothing on this page should be relied upon as the sole basis for commercial, technical, legal, or investment decisions without independent professional advice.

Document reference: ISO_webpage_knowledge-base_glossary_v1

Last modified: 29 March 2026

Knowledge Base·Training & Fine-tuning·Reward Model