AGI Safety: Safety and Control Considerations for Artificial General Intelligence

CS 294-149: Safety and Control for Artificial General Intelligence (Fall 2018)

Basic info

Lectures: TuThu, 4:00p-5:30p, Soda 310. First lecture: Thursday, Aug 23.

Instructor Andrew Critch

Office Hours: Tue 3-4 pm, 678 Soda and Thu 5:30-6:30 pm, 651 Soda
Instructor Stuart Russell

Office Hours: by appointment
GSI Adam Gleave

Office Hours: Tue 5:30-6:30 pm, 651 Soda
GSI Michael Dennis

Office Hours: Mon 2-3 pm, 734 Sutardja Dai Hall

Prerequisites: Students should already be familiar with artificial intelligence at the level of an A or A+ in CS188 (Introduction to Artificial Intelligence) and/or CS189 (Introduction to Machine Learning), or an equivalent course at their home institution.

Enrollment: We'll take attendance at the beginning of semester and enroll based on qualifications, with graduate students taking precedence. For undergraduates, we'll generally require an A or A+ in CS188 and/or CS189, preferably both, as well as a strong math background. Students with exceptional technical accomplishments may also be considered. If there are too many interested students, we may ask for a short essay; the goal is to have a highly interactive class with a lot of discussion, and we cannot do that if there are too many students.

Subject matter

Turing, Wiener, Minsky, and others have noted that making good use of highly intelligent machines requires ensuring that the objectives of such machines are well aligned with those of humans. As we diversify and amplify the cognitive abilities of machine intelligences, a long-term control problem arises for society: by what mathematical and engineering principles can we maintain sufficient control, indefinitely, over entities substantially more intelligent, and in that sense more powerful, than humans? Is there any formal solution one could offer, before the deployment of powerful machine intelligences, to guarantee the safety of such systems for humanity?

This course will examine a variety of specific problems that could play into such a solution, many of which will have present-day industrial analogues. Special care will be taken to notice problems that might not have present-day analogues, to prepare students for anticipating future gaps in the theory and practice of AGI safety and control. With an eye toward the eventual social and economic demand for formal guarantees of safety for AGI-level systems, this exploratory, interdisciplinary, seminar-style class will cover papers from artificial intelligence, machine learning, cognitive science, political science, and other discplines concerned with modelling and control of complex systems.

To keep discussions grounded, students will be encouraged throughout the course to ask questions that bring technical and mathematical rigor to bear. Each student will also carry out a coding project near the beginning of the course, and an individual research project or an in-depth literature survey at the end of the course.

Class format

A typical 80-minute class period will comprise:

30 minutes of lecture and/or discussion,
30-minutes of student presentations,
20 minutes of flex time for transitions and/or announcements.

Learning objectives

By the end of the course, students will be expected to make progress on the following objectives:

General research skills.
1. critiquing a scientific paper's experimental design and analysis;
2. critiquing the applicability of a reserach paper to a particular application;
3. communicating scientific content to a peer audience;
4. surveying literature related to a particular topic.
Domain-specific skills.
1. Proving theorems illustrative of problems or solutions in AI safety and control;
2. Training a simple deep RL agent to safely (or unsafely) navigate a grid-world
3. Connecting AI safety and control problems to relevant literature from other disciplines.
Domain-specific reasoning. By the end of the course students will be expected to develop well-thought-out opinions on the following questions:
1. Present-day problems:
  1. What safety and control problems exist in present-day AI technologies?
  2. What techniques already exist, or might soon exist, for addressing these problems?
  3. How do these problems translate into societal-scale concerns as AI capabilities advance and approach AGI?
2. Advanced capabilities problems:
  1. What safety and control problems will arise for the first time in AI systems with capabilities significantly more advanced than present-day systems?
  2. What capabilities, exactly, would precipitate those problems?
  3. How can we prepare to address these problems, given that they lack anologues in present-day applications?
3. AGI-specific problems:
  1. What safety and control problems do you expect to arise for the first time in AI systems with intelligence greatly exceeding humans in a broad set of domains?
  2. How can we prepare to address these AGI-specific problems before they occur?

Grading

15%: Coding project. At the beginning of the course, students will be asked to reproduce existing work on multi-agent reinforcement learning and run some additional experiments, per these Coding Project Guidelines. The grade will be based on the clarity of code written, and its modularity and usefulness for future experiments.
10%: Quizzes. Students will be expected to be familiar with the material for each lecture ahead of time, and will be tested for familiarity by a quiz on the assigned reading to be completed before the start of each lecture. Quizzes are graded with a 0, 1, or on occasion a 2 for exceptionally insightful answers. This process is intended to raise the quality of discourse about each paper presented, and to ensure better shared retention of the material. Around 2 hours of reading per week should be enough for a typical student. Quizzes are available here (@berkeley.edu login required) and will be released noon the day before the class.
15%: Paper Presentations. As a student, you will have at least one chance to present a paper. Signup here for a slot (@berkeley.edu login required). You will be graded based on your demonstrated level of insight into the material, including your ability to answer questions from instructors and other students, how well you relate the paper to other papers and lecture material, and how clearly you communicate.
15%: Oral participation. Students are expected to attend class, and ask questions during presentations from time to time.
5%: Written participation. Students will take turns making LaTeX lecture notes. Signup here for a slot (@berkeley.edu login required). The notes should be emailed to Michael Dennis within a week of each lecture, and will be shared online after vetting.
10%: Position Paper. Half way through the course, students will write a position paper, presenting the best arguments and optionally taking a side on a debate relevant to AI safety. For suggested topics and further guidance please see this document.
30%: Final Project. During the latter half of the course, students will complete a final research or literature review project, per these Final Project Guidelines.
Full engagement requirement: Despite the percentages above, you will not be awarded a passing grade in this class if you don't engage with each method of evaluation above: present papers and engage with others' presentations, attend classes and quizzes regularly, submit a proposal and a final report for your project, and present your final project.

Important dates

- Coding project is due; see Coding Project Guidelines.
- A one-page final project proposal is due; see Final Project Guidelines.
- A two to three-page position paper is due; see Position Paper Guidelines.
- Final project presentations take place.
- Final project written reports are due.

Coding Project Guidelines

Each student is expected to complete a coding project individually or in a group of two. Project proposals are due on September 18 with final report due on . Please see here for additional guidelines, and search for other group members via this Google Sheet.

Final Project Guidelines

Your final project proposal should be 1 page in length, plus a references page, and should outline a project of one of the following types:

Type 1: Research Paper.
- Choose an important problem
- Explain why the problem is important
- Describe the current state of the art(s) available for solving the problem, and what is missing
- State your key insight, and how you will demonstrate its validity (e.g. mathematical proof, empirical testing, or both)

Type 2: Literature Survey.
- Choose an important problem or problem area pertaining to the course.
- Describe how you will find papers relevant to the problem -- where you will search, what search terms you will use, and what papers you began your search with.
- Decribe your inclusion criteria for the survey -- how are you going to decide whether to include a paper or not?
- Describe some quantitative or qualitiative dimensions along which you will compare the papers. Be clear which dimensions are your subjective qualitative judgement, but don't shy away from including them.

The final project proposal will be due on , presentations occur on , and final written reports will be due on .

Lecture schedule

A tentative schedule is included below, and is subject to change throughout the course. Papers that have been locked into the schedule are marked with a (*).

Basic Info	Subject Matter	Class Format
Learning Objectives	Grading	Important Dates
Coding Project Guidelines	Final Project Guidelines	Lecture Schedule

CS 294-149: Safety and Control for Artificial General Intelligence (Fall 2018)

Instructor Andrew Critch

Instructor Stuart Russell

GSI Adam Gleave

GSI Michael Dennis