Anthropic AI Safety Fellow

Job Expired
    Posted Date: Dec 23, 2025
    Closing Date: Jan 12, 2026
  • Full Time
  • London, UK
  • Applications have closed

Anthropic

Building reliable, interpretable, and steerable AI systems for the benefit of humanity

JOB DESCRIPTION

The Anthropic AI Safety Fellowship is a four-month, full-time research program designed to accelerate the development of technical talent focused on frontier AI safety. Fellows work on empirical research projects aligned with Anthropic’s safety priorities, with the goal of producing public research outputs such as papers, evaluations, or open-source tools.

Fellows primarily use external infrastructure, including open-source models and public APIs, and receive close mentorship from Anthropic researchers. The program emphasizes hands-on experimentation, rapid iteration, and contribution to the broader AI safety research community.

Participants join one of multiple annual cohorts, with upcoming start dates in May and July 2026. Most fellows work from shared research spaces in London or Berkeley, though remote participation is supported for qualified candidates located in the US, UK, or Canada.

Key areas of research include scalable oversight, adversarial robustness, mechanistic interpretability, AI control, AI welfare, and empirical investigations into alignment failures. Fellows are expected to design, implement, and evaluate research ideas independently while collaborating closely with mentors and peers.

This fellowship is intended for candidates interested in transitioning into full-time AI safety research. While full-time roles are not guaranteed, strong performance may lead to future opportunities at Anthropic or within the broader AI safety ecosystem.

Scroll to Top