AI SAFETY RESOURCES

Apply to the AGI Safety Fundamentals Programme!

AI AS AN EXISTENTIAL THREAT

Technological revolutions have previously occurred in human history. Most recently, during the industrial revolution, much of our physical labour was outsourced to machines. This has left people’s lives looking nothing like they did pre-industrialisation, in terms of daily business, opportunity and outcomes.

As the field of Artificial Intelligence (AI) grows, there are plausible pathways for us to outsource our mental labour in a similar way. We might expect the world to look significantly different again if we achieve capable Artificial General Intelligence (AGI). If and when we do, it’s vitally important that such agents would have humans’ best interests at heart. Current trends suggest we may be able to achieve human-level intelligence this century, which makes working on the alignment problem a pressing matter for today’s researchers. There are two main components we are focusing on at CERI, for working on this challenge.

ALIGNMENT - SPECIFYING HUMANITY’S GOALS

It’s plausible that we can create artificial agents that are general and more intelligent than ourselves. These agents may have the ability to take actions in the world, either granted to it for economic purposes or instrumentally obtained by the agent via some oversight of its designers.

As of today, we have no mechanisms to guarantee control of a superintelligent agent. By virtue of wanting to achieve its goal, it may internalise that we have the ability or intent to stop them. This is one example of an instrumental goal (self-preservation). Another is self-improvement; achieving its goals is easier if it seeks arbitrary power, resources and intelligence.

Given self-preserving, self-improving AI, any misalignment between humans’ goals and the machine’s goals could be catastrophic and unstoppable. This could happen very quickly the first time we achieve general, super-intelligence, due to intelligence explosion dynamics. Alternatively, we may slowly cede control of the world and end up locked in to a suboptimal system that is impossible for us to nudge.

GOVERNANCE - MANAGING TRANSFORMATIVE TECHNOLOGIES

It is important to concurrently ask how we can make the world more robust to transformative periods, through effective policy and governance. For example, what are the economic or international conditions which might influence how AGI is developed? Will race to the bottom dynamics stop state and/or private developers from taking the time to solve the alignment problem of AI in the first place?

If we do solve the alignment problem and create contained agents which do in fact do what we want, there are yet more governance considerations. Who owns AGI - was it a military, a private company, or a government that produced it? Could this lead to arms race dynamics between states? Will they be misused (e.g. for conquest), or are we making ourselves more prone to accidents (e.g. lethal autonomous weapons, or LAWs)?

We might attempt policy and governance interventions that reduce these risks, such as enforcing bans on e.g. LAWs, determining how benefits should be distributed, and investigating race dynamics to gain insights into interventions that may cool them, when the time comes. Others look at e.g. monitoring access to compute, to keep track of how accessible AGI will be, and who is currently most capable of producing it, to gain information.

OUR RESOURCES & OPPORTUNITIES

You can apply to work on a project relevant to understanding the effects of AI on existential risk at our Summer Research Fellowship.

We help Effective Altruism Cambridge run the AGI Safety Fundamentals programme, which is a great way to learn about AI safety with other people. The programme is currently in its third iteration.

Opportunities in Cambridge relevant to AI alignment are regularly posted on our mailing list.

OTHER RESOURCES

QUICK INTRODUCTIONS

MORE COMPREHENSIVE INTRODUCTIONS

NEWSLETTERS & MAILING LISTS

FORUMS / RESEARCH HUBS

BOOKS

  • The Alignment Problem (Brian Christian)

  • Human Compatible (Stuart Russell)

  • Superintelligence (Nick Bostrom)

MORE