AI SAFETY RESOURCES

Apply to the AGI Safety Fundamentals Programme!

AI AS AN EXISTENTIAL THREAT

Technological revolutions have previously occurred in human history. Most recently, during the industrial revolution, much of our physical labour was outsourced to machines. This has left people’s lives looking nothing like they did pre-industrialisation, in terms of daily business, opportunity and outcomes.

As the field of Artificial Intelligence (AI) grows, there are plausible pathways for us to outsource our mental labour in a similar way. We might expect the world to look significantly different again if we achieve capable Artificial General Intelligence (AGI). If and when we do, it’s vitally important that such agents would have humans’ best interests at heart. Current trends suggest we may be able to achieve human-level intelligence this century, which makes working on the alignment problem a pressing matter for today’s researchers. There are two main components we are focusing on at CERI, for working on this challenge.

ALIGNMENT - SPECIFYING HUMANITY’S GOALS

It’s plausible that we can create artificial agents that are general and more intelligent than ourselves. These agents may have the ability to take actions in the world, either granted to it for economic purposes or instrumentally obtained by the agent via some oversight of its designers.

As of today, we have no mechanisms to guarantee control of a superintelligent agent. By virtue of wanting to achieve its goal, it may internalise that we have the ability or intent to stop them. This is one example of an instrumental goal (self-preservation). Another is self-improvement; achieving its goals is easier if it seeks arbitrary power, resources and intelligence.

Given self-preserving, self-improving AI, any misalignment between humans’ goals and the machine’s goals could be catastrophic and unstoppable. This could happen very quickly the first time we achieve general, super-intelligence, due to intelligence explosion dynamics. Alternatively, we may slowly cede control of the world and end up locked in to a suboptimal system that is impossible for us to nudge.

GOVERNANCE - MANAGING TRANSFORMATIVE TECHNOLOGIES

It is important to concurrently ask how we can make the world more robust to transformative periods, through effective policy and governance. For example, what are the economic or international conditions which might influence how AGI is developed? Will race to the bottom dynamics stop state and/or private developers from taking the time to solve the alignment problem of AI in the first place?

If we do solve the alignment problem and create contained agents which do in fact do what we want, there are yet more governance considerations. Who owns AGI - was it a military, a private company, or a government that produced it? Could this lead to arms race dynamics between states? Will they be misused (e.g. for conquest), or are we making ourselves more prone to accidents (e.g. lethal autonomous weapons, or LAWs)?

We might attempt policy and governance interventions that reduce these risks, such as enforcing bans on e.g. LAWs, determining how benefits should be distributed, and investigating race dynamics to gain insights into interventions that may cool them, when the time comes. Others look at e.g. monitoring access to compute, to keep track of how accessible AGI will be, and who is currently most capable of producing it, to gain information.

OUR RESOURCES & OPPORTUNITIES

You can apply to work on a project relevant to mitigating risks from advanced AI through the ERA Fellowship Programme, held every summer in Cambridge, UK.

Apply to the AGI Safety Fundamentals programme, which is a great way to learn about AI safety with other people from around the world.

Opportunities in Cambridge relevant to AI alignment are regularly posted on our mailing list.

OTHER RESOURCES

QUICK INTRODUCTIONS

Rob Miles (Youtube video)

The case for taking AI seriously as a threat to humanity (Kelsey Piper, Vox)
Benefits and Risks of Artificial Intelligence (Future of Life Institute)

MORE COMPREHENSIVE INTRODUCTIONS

An Introduction to the AI Alignment Landscape (Neel Nanda, former EA Cambridge member, current researcher at Anthropic)

2022 AGI Safety Fundamentals Curriculum (curriculum designed by Richard Ngo, former DeepMind ML research engineer and currently on the OpenAI policy team) (also hosted on this page on the EA Cambridge website)
2022 AI Governance Fundamentals Curriculum

NEWSLETTERS & MAILING LISTS

The Alignment Newsletter (weekly updates on research on the AI alignment problem)

FORUMS / RESEARCH HUBS

The Alignment Forum

BOOKS

The Alignment Problem (Brian Christian)
Human Compatible (Stuart Russell)
Superintelligence (Nick Bostrom)