University of California at Berkeley
Dept of Electrical Engineering & Computer Sciences

CS 294-40
Learning for robotics and control

(3 units)

Announcements

new 11/20/2008 Project write-ups will be due on December 11.
11/12/2008 PS3: mdp.zip was missing the file ps3_vi_GS.m, I reposted a new mdp.zip which includes it.
11/04/2008 Reminder: project milestone #2 is due on Sunday November 9. It is a pdf which contains (a) 2-3 pages on accomplishments so far (description of current status/approach/attempts; could also be survey of readings if no initial algorithmic, theoretical or experimental results yet), (b) .5 - 1 page on plans
11/04/2008 Problem set #3 has been posted. It is due on Tuesday November 18 before lecture.
10/06/2008 Project milestone #1 is: a pdf which contains (a) 1-2 pages on accomplishments so far (description of current status/approach/attempts; could also be survey of readings if no initial algorithmic, theoretical or experimental results yet), (b) .5 - 1 page on plans
10/06/2008 Project grading: milestone #1 5%, milestone #2 5%, presentation 15%, write-up/final results 25%.
9/30/2008 Reminder: Project milestone #1 is due on October 12th!
9/30/2008 Problem set #2 has been posted. It is due on Tuesday October 21 before lecture.
9/30/2008 Lecture of Thursday October 2 is moved to earlier time: it will run from 10am-1130am, usual classroom.
9/18/2008 Problem set submission procedure: write submission date and time on the first page, and put into recipient at my door (Soda 721/719).
9/18/2008 Office hours of 9/23 are moved to 9/25 same time same place.
9/3/2008 Grading policy: 50% final project, 40% problem sets, 10% scribing.
9/3/2008 Problem set 1 went out. It is due on Tuesday September 23 before lecture.

Announcements
Problem sets
Lectures
People
Course description
Prerequisites
Grading
Final project
Homework policy
Syllabus and scribed notes
Related materials

Problem sets

PS 1: pdf.
PS 2: pdf, helicopter code, quadruped code.
PS 3: pdf, mdp, kf, kf_data.

Lectures

Tuesdays and Thursdays 11:00-12:30, 405 Soda

People

Instructor
Pieter Abbeel
Office: Soda 721
Office hours: Tuesdays 12:30 - 2:30, and by appointment
Email:

Course description

This is an advanced course in learning for robotics and control. The goal of this course is to help the audience with their research in learning for robotics and control or related topics. A tentative list of topics includes:

Markov decision processes: value iteration, policy iteration, linear programming, Q learning, TD, value function approximation, inverse reinforcement learning
Control: linear quadratic regulator, differential dynamic programming, receding horizon / model predictive control
Estimation: (extended) Kalman filters, particle filters, SLAM
Robotics: basic principles of various robots, sensors, microcontrollers
Exploration/Exploitation: bandits, no-regret, e^3

Prerequisites

Familiarity with mathematical proofs, machine learning, artificial intelligence, optimization, probability, algorithms, linear algebra; ability to implement algorithmic ideas in code (C/C++ and matlab).

Graduate students only (consent of instructor required for undergraduate students, please talk to me after first lecture and hand me summary of relevant classes/experience so I can decide whether to make an exception).

Grading

Open-ended final project
Problem sets
Scribing lectures

Final project

The final projects are open-ended projects. Students are encouraged to make appointments to discuss possible projects and to start this discussion right away. Students are also encouraged to closely interact with us on their projects throughout the semester.

Collaboration: The final project can be done individually, or in a group of two students.
Timeline:
- Friday September 12, 5pm: by this time, we have met and agreed on a project. Please send me email to make an appointment to discuss your project before this time.
- Sunday October 12, midnight: progress report #1 due by email. Please use pdf format.
- Sunday November 9, midnight: progress report #2 due by email.
- Week of Dec 1-7: project presentations. (Details tba.)

Homework policy

Collaboration: We strongly encourage students to form study groups. Students may discuss and work on homework problems in groups. However, each student must write down the solutions independently, and without referring to written notes from the joint session. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. In addition, each student should write on the problem set the set of people with whom s/he collaborated.
Late homeworks: Recognizing that students may face unusual circumstances and require some flexibility in the course of the quarter, each student will have a total of seven free late (calendar) days to use as s/he sees fit. Once these late days are exhausted, no late days will be allowed. However, no homework will be accepted more than four days after its due date, and late days cannot be used for the final project. Late days are counted at the granularity of days: e.g., 3 hours late is one late day.

Syllabus (subject to substantial change!) and scribed notes

Latex style file, latex macros file.

Lecture	Topic	Notes	Scriber
Th Aug 28	Class outline. MDPs.	pdf tex	Pieter Abbeel
Tu Sep 2	Dynamic programming, value iteration, contractions.	pdf tex png	Anand Kulkarni
Th Sep 4	Contractions, asynchronous value iteration	pdf tex	Yan Zhang
Tu Sep 9	Policy iteration, function approximation	pdf tex figures	Fernando Garcia Bermudez
Th Sep 11	Function approximation	pdf tex	Nimbus Goehausen
Tu Sep 16	LQR	pdf tex	Ankur Mehta
Th Sep 18	DDP	pdf tex	Brandon Basso
Tu Sep 23	Quadruped locomotion	zip	J. Zico Kolter
Th Sep 25	POMDP	pdf tex figures	Martin Moler Sorensen
Tu Sep 30	POMDP	pdf tex png	David Nachum
Th Oct 2	Bandits	pdf tex png	David Nachum
Tu Oct 7	Separation Principle, Dynamics Modeling	pdf tex	Pål From
Th Oct 9	Dynamics Modeling, Kalman Filtering	pdf tex	Andrew Wan
Tu Oct 14	Kalman Filtering	pdf tex	Jared Wood
Th Oct 16	Policy Gradient	pdf tex zip	Fernando Garcia Bermudez
Tu Oct 21	Policy Gradient	pdf tex zip	Jan Biermayer
Th Oct 23	TD, Sarsa, Q-learning	pdf tex	Yan Zhang
Tu Oct 28	TD, Sarsa, Q-learning, TD-Gammon	pdf tex figure	Anand Kulkarni
Th Oct 30	Reward Shaping	pdf tex	Pål From
Tu Nov 4	Exploration/Exploitation	pdf tex	Brandon Basso
Th Nov 6	No lecture
Tu Nov 11	Academic and Administrative Holiday
Th Nov 13	LP approach	pdf tex	Nimbus Goehausen
Tu Nov 18	Inverse reinforcement learning	pdf tex	Ankur Mehta
Th Nov 20	MPC, SLAM, Linearly solvable MDP's	pdf tex	Andrew Wan
Tu Nov 25	Learning to walk	pdf tex figures	Jared Wood
Th Nov 27	Happy Thanksgiving!
Tu Dec 2	Project presentations		Martin Moler Sorensen: logistics czar
Th Dec 4	Project presentations	Jan, Jared, David, Brandon	Jan Biermayer: logistics czar

Further topics to be covered: POMDPs, certainty equivalence, Kalman filters, SLAM, linear systems, stability, Lyapunov theory, exploration, bandits, case studies of robotic systems, apprenticeship learning, linear programming approach, ...

CS 294-40 Learning for robotics and control