9.520/6.860: Statistical Learning Theory and Applications, Fall 2019

Course description

The course covers foundations and recent advances of machine learning from the point of view of statistical learning and regularization theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that begin to solve complex tasks, until recently the exclusive domain of biological organisms, such as computer vision, speech recognition or natural language understanding: cameras recognize faces, smart phones understand voice commands, smart speakers/assistants answer questions and cars can see and avoid obstacles. The machine learning algorithms that are at the roots of these success stories are trained with examples rather than programmed to solve a task.

The content is roughly divided into three parts. In the first part, key algorithmic ideas are introduced, with an emphasis on the interplay between modeling and optimization aspects. Algorithms that will be discussed include classical regularization (regularized least squares, SVM, logistic regression, square and exponential loss), stochastic gradient methods, implicit regularization and minimum norm solutions.

In the second part, key ideas in statistical learning theory will be developed to analyze the properties of the algorithms previously introduced. Classical concepts like generalization, uniform convergence and Rademacher complexities will be developed, together with topics such as surrogate loss functions for classification, bounds based on margin, stability, and privacy.

The third part of the course focuses on deep learning networks. It will introduce theoretical frameworks addressing three key puzzles in deep learning: approximation theory — which functions can be represented more efficiently by deep networks than shallow networks — optimization theory — why can stochastic gradient descent easily find global minima — and machine learning — how generalization in deep networks used for classification can be explained in terms of complexity control implicit in gradient descent. It will also discuss connections with the architecture of the brain, which was the original inspiration of the layered local connectivity of modern networks and may provide ideas for future developments and revolutions in networks for learning.

The goal of the course is to provide students with the theoretical knowledge and the basic intuitions needed to use and develop effective machine learning solutions to challenging problems.

Prerequisites

We will make extensive use of basic notions of calculus, linear algebra and probability. The essentials are covered in class and in the math camp material. We will introduce a few concepts in functional/convex analysis and optimization. Note that this is an advanced graduate course and some exposure on introductory Machine Learning concepts or courses is expected. Students are also expected to have basic familiarity with MATLAB/Octave.

Grading

Requirements for grading are attending lectures/participation (10%), four problems sets (60%) and a final project (30%).

Grading policies, pset and project tentative dates: (slides)

Problem Sets

Problem Set 1, out: Sep. 19, due: Wed., Sep. 25 (Class 07).
Problem Set 2, out: Oct. 03, due: Wed., Oct. 09 (Class 10).
Problem Set 3, out: Oct. 31, due: Wed., Nov. 06 (Class 18).
Problem Set 4, out: Nov. 18, due: Wed., Nov. 27 (Class 21).

Submission instructions: Follow the instructions included with the problem set. Use the latex template for the report (there is a maximum page limit). Submit your report online through stellar.mit by the due date/time and a printout in the first class after the due date.

Projects

Guidelines and key dates. Online form for project proposal (complete by Nov. 01).

Reports are expected to be within 5 pages, with extended abstracts using NIPS style files

Projects archive

List of Wikipedia entries, created or edited as part of projects during previous course offerings.

9.520/6.7910: Statistical Learning Theoryand Applications

Fall 2019