9.520/6.860: Statistical Learning Theory and Applications, Fall 2020 – Home

Course description

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last two decades, for the first time, artificial intelligence systems have been developed that begin to solve complex tasks, until recently the exclusive domain of biological organisms, such as computer vision, speech recognition or natural language understanding: cameras recognize faces, smart phones understand voice commands, smart speakers/assistants answer questions and cars can see and avoid obstacles. The machine learning algorithms that are at the roots of these success stories are trained with examples rather than programmed to solve a task. However, a comprehensive theory of learning is still incomplete, mainly because of the puzzles presented by the success of deep learning. An eventual theory of learning that explains why and how deep networks work and what their limitations are, may thus enable the development of even much more powerful learning approaches and even inform our understanding of human intelligence.

In this spirit, the course covers foundations and recent advances in statistical machine learning theory, with the dual goal a) of providing students with the theoretical knowledge and the intuitions needed to use effective machine learning solutions and b) to prepare more advanced students to contribute to progress in the field. This year the emphasis is on b).

The course is organized about the core idea of supervised learning as an inverse problem, with stability as the key property required for good generalization performance of an algorithm.

The content is roughly divided into three parts. The first part is about classical regularization (regularized least squares, kernel machines, SVM, logistic regression, square and exponential loss), uniform convergence, Rademacher complexities, margin, stochastic gradient methods, overparametrization, implicit regularization and stability of minimum norm solutions. The second part is about deep networks: approximation theory — which functions can be represented more efficiently by deep networks than shallow networks — optimization theory — why can stochastic gradient descent easily find global minima — and estimation error — how generalization in deep networks can be explained in terms of the stability implied by the complexity control implicit in gradient descent. The third part is about the connections between learning theory and the brain, which was the original inspiration for modern networks and may provide ideas for future developments and breakthroughs in the theory and the algorithms of leaning. Throughout the course we will have occasional talks by leading researchers on advanced research topics.

This course will be quite different this years from recent versions. Apart for the first part on regularization, this year course is designed to foster discussions, conjectures and exploratory projects on ongoing research rather than on teaching well-established machine learning concepts. Unlike previous versions, it is not appropriate for students who just want an introduction to the field of machine learning since it is an advanced graduate course for mature students with a good background in the field.

Prerequisites

We will make extensive use of basic notions of calculus, linear algebra and probability. The essentials are covered in class and in the math camp material. We will introduce a few concepts in functional/convex analysis and optimization. Note that this is an advanced graduate course and some exposure on introductory Machine Learning concepts or courses is expected. Students are also expected to have basic familiarity with MATLAB/Octave.

Grading

Requirements for grading are attending lectures/participation (33%), three problem sets (33%) and a final project (34%).

Grading policies, pset and project tentative dates: (slides)

Problem Sets

Problem Set 1, out: Thu. Sep. 10, due: Sun., Sep. 20
Problem Set 2, out: Mon. Oct. 05, due: Fri., Oct. 16
Problem Set 3, out: Mon. Oct. 26, due: Sun., Nov. 08

Submission instructions: Follow the instructions included with the problem set. Use the latex template for the report. Submit your report online through canvas by the due date/time.

Scribing Lecture Notes

Students will have to prepare a detailed set of lecture notes in groups of 3 (to be assigned at random). Students may use the lecture recordings, slides, and any other resources they deem appropriate to prepare the lecture notes. Each group will scribe at most one lecture. Use the latex template for preparing the notes. Submit your report online through canvas within 2 weeks of the assigned lecture.

Projects

Guidelines and key dates. Online form for project proposal (complete by Fri. Nov. 06).

Final grading of the projects will take place on Wed. December 9th. There will be a 15 minute session in which the students will have to present their work.

Projects archive

List of Wikipedia entries, created or edited as part of projects during previous course offerings.

Navigating Student Resources at MIT

This document has more information about navigating student resources at MIT

9.520/6.860: Statistical Learning Theoryand Applications

Fall 2020