Anthropic AI Safety Fellow Program 2026 – Paid AI Research Fellowship

Job Expired

Posted Date: Dec 23, 2025
Closing Date: Jan 12, 2026

Full Time
London, UK
Posted on December 23, 2025
Applications have closed

Anthropic

Building reliable, interpretable, and steerable AI systems for the benefit of humanity

JOB DESCRIPTION

The Anthropic AI Safety Fellowship is a four-month, full-time research program designed to accelerate the development of technical talent focused on frontier AI safety. Fellows work on empirical research projects aligned with Anthropic’s safety priorities, with the goal of producing public research outputs such as papers, evaluations, or open-source tools.

Fellows primarily use external infrastructure, including open-source models and public APIs, and receive close mentorship from Anthropic researchers. The program emphasizes hands-on experimentation, rapid iteration, and contribution to the broader AI safety research community.

Participants join one of multiple annual cohorts, with upcoming start dates in May and July 2026. Most fellows work from shared research spaces in London or Berkeley, though remote participation is supported for qualified candidates located in the US, UK, or Canada.

Key areas of research include scalable oversight, adversarial robustness, mechanistic interpretability, AI control, AI welfare, and empirical investigations into alignment failures. Fellows are expected to design, implement, and evaluate research ideas independently while collaborating closely with mentors and peers.

This fellowship is intended for candidates interested in transitioning into full-time AI safety research. While full-time roles are not guaranteed, strong performance may lead to future opportunities at Anthropic or within the broader AI safety ecosystem.