Deep Learning Theory: Approximation
9.520/6.860, Class 23
Instructor: Tomaso Poggio
Description
A mathematical theory of deep networks and of why they work as well as they do is now emerging. I will review some recent theoretical results on the approximation power of deep networks including conditions under which they can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. I will also discuss another puzzle around deep networks: what guarantees that they generalize and they do not overfit despite the number of weights being larger than the number of training data and despite the absence of explicit regularization in the optimization?
Class Reference Material
Slides: PDF.
T. Poggio, Why and When Can Deep-but Not Shallow-networks Avoid the Curse of Dimensionality: A Review, DOI: 10.1007/s11633-017-1054-2
Further Reading
- H. Mhaskar and T. Poggio, Deep versus shallow networks: an approximation theory perspective, Center for Brains, Minds and Machines (CBMM) Memo No. 54, 2016.
- H. Mhaskar, Q. Liao, T. Poggio, Learning Functions: When Is Deep Better Than Shallow, 2016.
- A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numerica, vol. 8, pp. 143-195, 1999.
- D. L. Donoho, High-dimensional data analysis: the curses and blessings of dimensionality, 2000.