Careers›AI Research & Quality›RLHF Manager

AI Research & QualityHybridFull-timeDisability Confident Committed

RLHF Manager

Lead Reinforcement Learning from Human Feedback (RLHF) programmes across Coaley Peak's internal AI engine development and external client AI projects, ensuring our models are safe, aligned, and genuinely useful.

Location

Remote (UK) / Cheltenham, UK

Salary

£55,000 – £75,000 per annum (DOE)

Positions

1 position

Reference

CP-RLF-2025-001

Published

22 March 2025

Closing

15 December 2026

About the role

Reinforcement Learning from Human Feedback is one of the most important mechanisms for making AI systems behave well in the real world. As RLHF Manager at Coaley Peak, you will design and manage the feedback pipelines, annotation programmes, and evaluation frameworks that keep our models (and our clients' models) aligned with human values, business objectives, and UK regulatory expectations.

This is a dual-facing role. Internally, you will own the RLHF and alignment processes for our proprietary AI engines (Owlpen, Anvil, Flint, Warden), working directly with our data scientists and engineers to define reward models, manage human evaluator programmes, and measure output quality over time. Externally, you will support client AI projects where alignment and quality assurance are a requirement, particularly in regulated sectors such as financial services, healthcare, and government.

We are looking for someone with a rigorous mind, a genuine interest in AI safety and alignment, and the project management capability to run structured annotation and evaluation programmes at scale. This role sits within our AI Research & Quality team and reports to the Head of AI.

The kind of person we are looking for

At Coaley Peak, the technical work is only half the job. We are looking for people who are genuinely reliable, who do what they say they will, when they said they would, without needing to be chased. People who are friendly and easy to work with, both with colleagues and with clients. People who can sit in a boardroom and explain a complex AI model in plain English, and then go back to their laptop and write clean, well-documented code. People who are hungry, hard-working, and take real pride in their output, not because someone is watching, but because that is simply how they operate.

You are meticulous without being slow. You understand that getting AI alignment right requires both technical rigour and human judgement, and you are equally comfortable designing an evaluation rubric, briefing a team of human annotators, and presenting findings to a client's technical steering group. You are intellectually honest, you flag problems before they become failures, and you treat every degradation in model behaviour as a signal worth investigating rather than a number to smooth over. You care about AI being safe and useful, not just impressive.

What you will do

→Design, implement, and manage RLHF pipelines for Coaley Peak's internal AI engines, including reward model specification, data collection, human evaluation, and iterative fine-tuning
→Recruit, brief, and quality-control human annotator and evaluator programmes (internal and contracted)
→Define and maintain evaluation frameworks and benchmarks for model output quality, safety, and alignment
→Support external client AI projects requiring RLHF, alignment review, or model evaluation services, scoping requirements, managing delivery, and reporting outcomes
→Work closely with data scientists to translate RLHF insights into model training decisions
→Monitor deployed models for alignment drift, output degradation, and safety-critical behaviours
→Produce clear documentation of RLHF methodologies, evaluation results, and quality standards
→Stay current with academic and industry developments in RLHF, Constitutional AI, RLAIF, and related alignment techniques
→Contribute to Coaley Peak's AI governance and responsible AI frameworks

What we are looking for

Essential

→Strong understanding of RLHF, preference learning, and alignment concepts, able to explain and operationalise them in a business context
→Experience managing structured data annotation, human evaluation, or labelling programmes
→Proficiency in Python; familiarity with ML frameworks and LLM APIs (OpenAI, Anthropic, HuggingFace, or similar)
→Excellent project management skills: able to run parallel programmes with multiple contributors and tight quality standards
→Clear, precise written and verbal communication, able to document methodology and present findings to both technical and non-technical audiences
→Right to work in the United Kingdom

Desirable (not essential)

→Direct experience with RLHF pipelines in production (reward modelling, PPO, DPO, or similar)
→Familiarity with Constitutional AI, RLAIF, or other scalable oversight approaches
→Experience in AI safety research or AI governance
→Background in cognitive science, linguistics, psychology, or philosophy (relevant to preference elicitation and evaluation design)
→Experience working in regulated sectors where AI output quality has legal or compliance implications

What we offer

→Salary of £55,000 – £75,000 per annum, dependent on experience, reviewed annually
→28 days annual leave plus bank holidays
→Flexible hybrid working, remote-first with optional Cheltenham HQ access
→Private healthcare and employee assistance programme (EAP)
→£2,000 annual professional development budget plus access to research resources
→The opportunity to shape alignment practice at a fast-growing AI consultancy
→Regular team offsites and an annual international travel programme
→High-trust, outcome-focused culture
→Accredited Living Wage Employer

Vetting & pre-employment checks

Enhanced DBS

All Coaley Peak roles require a minimum of an Enhanced DBS check as standard. By applying you consent to these checks being conducted in the event of an offer being made. This role involves access to sensitive model outputs and may involve work on client projects in regulated sectors. A standard background and reference check is required in addition to Enhanced DBS.

How to apply

Send your CV and a covering note to careers@coaleypeak.co.uk, quoting reference CP-RLF-2025-001. We are particularly interested in your experience with annotation programme design, evaluation methodology, or alignment work, describe something concrete you have built or managed. We review on a rolling basis and acknowledge all applications within five working days.

Applications are accepted electronically via our online form below, or by email to careers@coaleypeak.co.uk. If you require this role information or the application form in an alternative format (including large print, audio, or plain text) please email us before applying and we will provide it promptly.

Apply for this role →

Disability Confident Committed, inclusive hiring

DWP Disability Confident Committed employer

We are committed to inclusive and accessible recruitment. As a Disability Confident Committed employer, we will offer an interview to any disabled applicant who meets the essential criteria for this role as set out above. Please indicate in your application if you wish to be considered under this commitment.

We anticipate and provide reasonable adjustments throughout every stage of the recruitment process (including the application form, any assessments, and the interview itself. If you need any adjustments) alternative formats, additional preparation time, a different interview setting, British Sign Language interpretation, or anything else, please let us know as early as possible by emailing careers@coaleypeak.co.uk.

This job advert and all supporting documents are available in alternative accessible formats on request, including large print and electronic formats. Email us at careers@coaleypeak.co.uk with your preferred format.

All requests are handled in confidence and will not affect how your application is assessed. We are committed to supporting any existing employee who acquires a disability or long-term health condition to remain in work and continue contributing their skills and experience.

Career progression

Typical entry points

ML Engineer

Research Scientist

AI Quality Analyst

Data Scientist

RLHF Manager

Where this can lead

Head of AI Quality

Director of AI Research

VP of Machine Learning

Chief AI Officer

Disclaimer: Career progression paths shown are indicative and based on typical industry trajectories. They are not a guarantee of promotion or role availability at Coaley Peak or any other organisation. Progression depends on individual performance, business requirements, and market conditions.

Similar roles at Coaley Peak

AI Data Engineer

Data & Engineering · Cheltenham, UK

→

Full-Stack Platform Engineer

Platform & Engineering · Remote (UK)

→

ISO 9001 & 27001 Internal Auditor

Compliance & Quality · Cheltenham, UK

→

Similar jobs outside Coaley Peak

Roles in the broader market with significant overlap to this position.

RLHF Research Engineer97% match

AI Alignment Researcher91% match

AI Quality Assurance Lead88% match

ML Annotation Programme Manager85% match

Data Quality Manager (AI/ML)82% match

Ready to apply?

Complete the application form, we review every submission personally. Quote reference CP-RLF-2025-001.

Apply now →

Role at a glance

TeamAI Research & Quality

LocationRemote (UK) / Cheltenham, UK

Work typeHybrid

ContractFull-time

Salary£55,000 – £75,000 per annum (DOE)

VettingEnhanced DBS

RefCP-RLF-2025-001

Our platform

You'll work with Owlpen, our proprietary cost intelligence platform, deployed across live client operations.

Learn about Owlpen →

AIRLHFAlignmentResearchQualityLLMs

Ref: CP-RLF-2025-001 · iso_process_recruitment_job_listing v1

Legal notices

Equal opportunities. Coaley Peak Ltd is an equal opportunities employer committed to a diverse and inclusive workplace. We do not discriminate on the basis of age, disability, sex, gender reassignment, sexual orientation, pregnancy or maternity, race, religion or belief, or marriage and civil partnership, in line with the Equality Act 2010.

Disability Confident Committed. We are a DWP Disability Confident Committedemployer. We guarantee an interview to disabled applicants who meet the essential criteria for this role. We actively promote vacancies through Jobcentre Plus and local disabled people's organisations, review our recruitment processes regularly, and anticipate reasonable adjustments at every stage. We also commit to supporting any employee who acquires a disability or long-term health condition to remain in work. All reasonable adjustments are provided without charge to the candidate.

Right to work. All offers of employment are conditional on evidence of the right to work in the UK in accordance with the Immigration, Asylum and Nationality Act 2006.

Data protection. Candidate data is processed under UK GDPR Article 6(1)(b) and the Data Protection Act 2018, retained for up to 12 months, and held by Coaley Peak Ltd (ICO registered Data Controller). See our Privacy Policy.

Feedback & complaints. Concerns about our recruitment process should be directed to recruitment.control@coaleypeak.co.uk. All complaints are handled under ISO 9001:2015. Candidates may also contact the EHRC.

Pre-employment checks. Offers are subject to satisfactory references and identity checks. Role-specific DBS checks will be disclosed at application stage, in accordance with the Rehabilitation of Offenders Act 1974.

Working time & pay. This role complies with the Working Time Regulations 1998. Coaley Peak Ltd is a Living Wage accredited employer. Salary ranges are reviewed annually.

Coaley Peak Ltd · Co. 11783676 · VAT GB374552088 · The Limes, Bayshill Road, Cheltenham GL50 3AW, UK · Registered in England & Wales · iso_process_recruitment_job_listing v1

Document reference: ISO_webpage_careers-listing_v1

Last modified: 24 March 2026

Careers·Job Listing