RLHF Manager
Lead Reinforcement Learning from Human Feedback (RLHF) programmes across Coaley Peak's internal AI engine development and external client AI projects — ensuring our models are safe, aligned, and genuinely useful.
Location
Remote (UK) / Cheltenham, UK
Salary
£55,000 – £75,000 per annum (DOE)
Positions
1 position
Reference
CP-RLF-2025-001
Published
22 March 2025
Closing
Rolling
About the role
Reinforcement Learning from Human Feedback is one of the most important mechanisms for making AI systems behave well in the real world. As RLHF Manager at Coaley Peak, you will design and manage the feedback pipelines, annotation programmes, and evaluation frameworks that keep our models — and our clients' models — aligned with human values, business objectives, and UK regulatory expectations.
This is a dual-facing role. Internally, you will own the RLHF and alignment processes for our proprietary AI engines (Owlpen, Anvil, Flint, Warden), working directly with our data scientists and engineers to define reward models, manage human evaluator programmes, and measure output quality over time. Externally, you will support client AI projects where alignment and quality assurance are a requirement — particularly in regulated sectors such as financial services, healthcare, and government.
We are looking for someone with a rigorous mind, a genuine interest in AI safety and alignment, and the project management capability to run structured annotation and evaluation programmes at scale. This role sits within our AI Research & Quality team and reports to the Head of AI.
The kind of person we are looking for
At Coaley Peak, the technical work is only half the job. We are looking for people who are genuinely reliable — who do what they say they will, when they said they would, without needing to be chased. People who are friendly and easy to work with, both with colleagues and with clients. People who can sit in a boardroom and explain a complex AI model in plain English, and then go back to their laptop and write clean, well-documented code. People who are hungry, hard-working, and take real pride in their output — not because someone is watching, but because that is simply how they operate.
You are meticulous without being slow. You understand that getting AI alignment right requires both technical rigour and human judgement, and you are equally comfortable designing an evaluation rubric, briefing a team of human annotators, and presenting findings to a client's technical steering group. You are intellectually honest — you flag problems before they become failures, and you treat every degradation in model behaviour as a signal worth investigating rather than a number to smooth over. You care about AI being safe and useful, not just impressive.
What you will do
- →Design, implement, and manage RLHF pipelines for Coaley Peak's internal AI engines — including reward model specification, data collection, human evaluation, and iterative fine-tuning
- →Recruit, brief, and quality-control human annotator and evaluator programmes (internal and contracted)
- →Define and maintain evaluation frameworks and benchmarks for model output quality, safety, and alignment
- →Support external client AI projects requiring RLHF, alignment review, or model evaluation services — scoping requirements, managing delivery, and reporting outcomes
- →Work closely with data scientists to translate RLHF insights into model training decisions
- →Monitor deployed models for alignment drift, output degradation, and safety-critical behaviours
- →Produce clear documentation of RLHF methodologies, evaluation results, and quality standards
- →Stay current with academic and industry developments in RLHF, Constitutional AI, RLAIF, and related alignment techniques
- →Contribute to Coaley Peak's AI governance and responsible AI frameworks
What we are looking for
Essential
- →Strong understanding of RLHF, preference learning, and alignment concepts — able to explain and operationalise them in a business context
- →Experience managing structured data annotation, human evaluation, or labelling programmes
- →Proficiency in Python; familiarity with ML frameworks and LLM APIs (OpenAI, Anthropic, HuggingFace, or similar)
- →Excellent project management skills: able to run parallel programmes with multiple contributors and tight quality standards
- →Clear, precise written and verbal communication — able to document methodology and present findings to both technical and non-technical audiences
- →Right to work in the United Kingdom
Desirable (not essential)
- →Direct experience with RLHF pipelines in production (reward modelling, PPO, DPO, or similar)
- →Familiarity with Constitutional AI, RLAIF, or other scalable oversight approaches
- →Experience in AI safety research or AI governance
- →Background in cognitive science, linguistics, psychology, or philosophy (relevant to preference elicitation and evaluation design)
- →Experience working in regulated sectors where AI output quality has legal or compliance implications
What we offer
- →Salary of £55,000 – £75,000 per annum, dependent on experience, reviewed annually
- →28 days annual leave plus bank holidays
- →Flexible hybrid working — remote-first with optional Cheltenham HQ access
- →Private healthcare and employee assistance programme (EAP)
- →£2,000 annual professional development budget plus access to research resources
- →The opportunity to shape alignment practice at a fast-growing AI consultancy
- →Regular team offsites and an annual international travel programme
- →High-trust, outcome-focused culture
- →Accredited Living Wage Employer
Vetting & pre-employment checks
Enhanced DBS
All Coaley Peak roles require a minimum of an Enhanced DBS check as standard. By applying you consent to these checks being conducted in the event of an offer being made. This role involves access to sensitive model outputs and may involve work on client projects in regulated sectors. A standard background and reference check is required in addition to Enhanced DBS.
How to apply
Send your CV and a covering note to careers@coaleypeak.co.uk, quoting reference CP-RLF-2025-001. We are particularly interested in your experience with annotation programme design, evaluation methodology, or alignment work — describe something concrete you have built or managed. We review on a rolling basis and acknowledge all applications within five working days.
Apply for this role →Accessibility & reasonable adjustments
We want every candidate to have the best possible experience of our hiring process. If you need any adjustments — alternative formats, additional preparation time, a different interview setting, or anything else — please let us know as early as possible by emailing careers@coaleypeak.co.uk. All requests are handled in confidence and will not affect how your application is assessed.
Career progression
Typical entry points
Where this can lead
Disclaimer: Career progression paths shown are indicative and based on typical industry trajectories. They are not a guarantee of promotion or role availability at Coaley Peak or any other organisation. Progression depends on individual performance, business requirements, and market conditions.
Similar roles at Coaley Peak
Similar jobs outside Coaley Peak
Roles in the broader market with significant overlap to this position.
Ready to apply?
Complete the application form — we review every submission personally. Quote reference CP-RLF-2025-001.
Apply now →Role at a glance
Our platform
You'll work with Owlpen — our proprietary cost intelligence platform, deployed across live client operations.
Learn about Owlpen →Ref: CP-RLF-2025-001 · iso_process_recruitment_job_listing v1
Legal notices
Equal opportunities. Coaley Peak Ltd is an equal opportunities employer committed to a diverse and inclusive workplace. We do not discriminate on the basis of age, disability, sex, gender reassignment, sexual orientation, pregnancy or maternity, race, religion or belief, or marriage and civil partnership, in line with the Equality Act 2010.
Right to work. All offers of employment are conditional on evidence of the right to work in the UK in accordance with the Immigration, Asylum and Nationality Act 2006.
Data protection. Candidate data is processed under UK GDPR Article 6(1)(b) and the Data Protection Act 2018, retained for up to 12 months, and held by Coaley Peak Ltd (ICO registered Data Controller). See our Privacy Policy.
Feedback & complaints. Concerns about our recruitment process should be directed to recruitment.control@coaleypeak.co.uk. All complaints are handled under ISO 9001:2015. Candidates may also contact the EHRC.
Pre-employment checks. Offers are subject to satisfactory references and identity checks. Role-specific DBS checks will be disclosed at application stage, in accordance with the Rehabilitation of Offenders Act 1974.
Working time & pay. This role complies with the Working Time Regulations 1998. Coaley Peak Ltd is a Living Wage accredited employer. Salary ranges are reviewed annually.
Coaley Peak Ltd · Co. 11783676 · VAT GB374552088 · The Limes, Bayshill Road, Cheltenham GL50 3AW, UK · Registered in England & Wales · iso_process_recruitment_job_listing v1