**CBCL SEMINARS:**

## CBCL-CSAIL Brains, Minds & Machines Seminar Series |

The "Brains, Minds & Machines Seminar Series 2013-2014" is being organized by the Laboratory for Computational and Statistical Learning (LCSL) (a joint lab between MIT and the Italian Institute of Technology), and coordinated with the Center for Brains, Minds and Machines (CBMM). The purpose of the seminar series is to bring together students and faculty from CBMM and CSAIL who aim to understand the problem of intelligence in terms of its realization in the mind and the brain. One important focus of the series is on the problem of learning which is emerging as the gateway to understanding and reproducing intelligence, both biological and artificial.

*This seminar series was formerly known as "Brains & Machines Seminar Series." *

## Brains, Minds & Machines Seminar Series

#### Upcoming events:

## Making Collective Intelligence Work: Learning, Liquidity, and Manipulation in Markets

Speaker: Dr. Sanmay Das, Washington Univ. in St. Louis

Date: Thursday, April 17, 2014

Time: 3:00p

Location: Star Conference Room (32-D463), Stata Bldg, MIT

Host: Tomaso Poggio, CBMM, BCS, MIBR, CSAIL, LCSLAbstract: Collective intelligence systems, from prediction markets to Wikipedia, have the capacity to provide useful information by aggregating the wisdom of the crowd. Yet the mechanisms that govern how individuals interact in these forums can substantially affect the quality of information produced. I will discuss this issue in the context of two specific problems in prediction markets: ensuring sufficient liquidity and mitigating manipulation. The accuracy of the information reflected in market prices depends on the market´s liquidity. In a liquid market, arriving traders have someone to trade with at a "reasonable" price, so they are willing to participate and contribute their information. Liquidity provision can be framed as a reinforcement learning problem for a market-making agent, complicated by the censored nature of observations. I will describe an algorithm for solving this problem using moment-matching approximations in the belief space, and discuss theoretical results and empirical evaluation of the algorithm in experiments with trading agents and human subjects, showing that it offers several potential benefits over standard cost-function based approaches. In markets where participants influence the outcome of the events on which they are trading, concerns over manipulation naturally arise. I will present a game-theoretic model of manipulation, which gives insight into the question of how informative market prices are in the presence of manipulation opportunities, and also into how markets can affect the incentives of agents in the outside world. In addition, I will describe our experience with a field experiment related to manipulation, the Instructor Rating Markets. Time permitting, I will also briefly discuss work in my group on related issues in other types of collective intelligence systems, for example, information growth, user engagement, and manipulation in social media like Wikipedia and Reddit.

#### Past events:

## What is the information content of an algorithm?

Speaker:Joachim M. BuhmannSpeaker Affiliation:Computer Science Department, Machine Learning Laboratory, ETH, ZurichDate:Nov 7,2013Time:3:00 p.m.Host:Tomaso Poggio, Lorenzo RosascoLocation:Star Seminar Room-Bldg 32 (Stata), MITAbstract: Algorithms are exposed to randomness in the input or noise during the computation. How well can they preserve the information in the data w.r.t. the output space? Algorithms especially in Machine Learning are required to generalize over input fluctuations or randomization during execution. This talk elaborates a new framework to measure the "informativeness" of algorithmic procedures and their "stability" against noise. An algorithm is considered to be a noisy channel which is characterized by a generalization capacity (GC). The generalization capacity objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. The problem of grouping data is used to demonstrate this validation principle for clustering algorithms, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering. Our new validation approach selects the most informative clustering algorithm, which filters out the maximal number of stable, task-related bits relative to the underlying hypothesis class. The concept also enables us to measure how many bit are extracted by sorting algorithms when the input and thereby the pairwise comparisons are subject to fluctuations.

## Understanding the building blocks of neural computation: Insights from connectomics and theory

Speaker: Dmitri "Mitya" Chklovskii

Speaker Affiliation: Janelia Farm, HHMI

Date: October 10, 2013

Time: 11:30am

Place: Singleton Auditorium, MIT Bldg 46-3002Abstract: Animal behaviour arises from computations in neuronal circuits, but our understanding of these computations has been frustrated by the lack of detailed synaptic connection maps, or connectomes. For example, despite intensive investigations over half a century, the neuronal implementation of local motion detection in the insect visual system remains elusive. We developed a semi-automated pipeline using electron microscopy to reconstruct a connectome, containing 379 neurons and 8,637 chemical synaptic contacts, within the Drosophila optic medulla. By matching reconstructed neurons to examples from light microscopy, we assigned neurons to cell types and assembled a connectome of the repeating module of the medulla. Within this module, we identified cell types constituting a motion detection circuit, and showed that the connections onto individual motion-sensitive neurons in this circuit were consistent with their direction selectivity. Our identification of cell types involved in motion detection allowed targeting of extremely demanding electrophysiological recordings by other labs. Preliminary results from such recordings are consistent with a correlation-based motion detector. This demonstrates that connectomes can provide key insights into neuronal computations.

## Causal Inference and Anticausal Learning**

Speaker: Bernhard Schölkopf

Speaker Affiliation: Max Planck Institute for Intelligent Systems

Date: Friday, June 7, 2013

Time: 3:30pm

Place: Singleton Auditorium, MIT Bldg 46-3002Abstract: Causal inference is an intriguing field examining causal structures by testing their statistical footprints. The talk introduces the main ideas of causal inference from the point of view of machine learning, and discusses implications of underlying causal structures for popular machine learning scenarios such as covariate shift and semi-supervised learning. It argues that causal knowledge may facilitate some approaches for a given problem, and rule out others.

**This talk is co-sponsored with the Cambridge Machine Learning Colloquium and Seminar Series.

## Neural Circuits for Fly Visual Course Control

Speaker: Alexander Borst

Speaker Affiliation: Dept of Systems and Computational Neurobiology, Max-Planck-Institute of Neurobiology

Date: Mon, April 29 2013

Time: 4:00PM

Place: : MIT Bldg 46-3002 Singleton AuditoriumAbstract: Visual navigation has been studied extensively in flies, both in tethered as well as in freely flying animals. As neural control elements, the tangential cells of the lobula plate seem to play a key role: they are sensitive to visual motion, have large receptive fields, and, with their spatial distribution of preferred directions, match the optic flow as elicited during certain types of flight maneuvers. However, several key questions have remained unanswered for long:

1. What is the neural circuit presynaptic to the tangential cells responsible for extracting the local direction of motion?

2. Do the lobula plate tangential cells indeed control turning responses of the fly?

3. Is there a separate visual course control system allowing the fly to detect and track individual objects? I will present recent progress towards answering these questions made by combining whole-cell patch recording and behavioral studies with silencing and optogenetic stimulation of genetically targeted candidate neurons in Drosophila.For more on Prof. Bosrt's research, pelase visit Prof. Borst's his webpage: www.neuro.mpg.de/24306/borst

## From Vision to Memory to Concept Learning

Speaker: Cheston Tan

Speaker Affiliation: A*Star

Host: Tomaso Poggio

Host Affiliation: CBCL, MIBR, BCS

Date: 3-13-2013

Time: 4:00 PM - 5:00 PM

Location: McGovern 5th Floor Reading Room, MIT 46-5165As a caricature, for vision researchers, the map of the brain seems to end at inferotemporal cortex, because it is the last "purely visual" area. Conversely, for memory researchers, the map of the brain seems to begin at perirhinal/entorhinal cortex, because anything before that is a "sensory" area, not a "mnemonic" one. Is there any value in trying to understand the entire processing pipeline, from the retina to the hippocampus?

This will be an informal talk that tries to answer the question by making sense of the vast and disparate literatures on the neural machinery of visual processing and long-term memory. I will first review the anatomy and neurophysiology of the ventral visual pathway and the medial temporal lobe (brain structures linked to object recognition and long-term memory, respectively). In particular, from human electrophysiology, the notion of "concept cells" found in the medial temporal lobe (MTL) raises the question of whether the function of the MTL is "just memory", or if it is intricately involved in the learning of abstract concepts.

I speculate that the latter is true, and will explore the possibility that the MTL implements a learning scheme known as Compressed Sensing, and will relate this idea to MTL phenomena such as remapping, pattern separation, replay, consolidation and neurogenesis.

## Face Recognition: Can Computer Vision Draw Lessons from Biological Vision?

Speaker: Cheston Tan

Speaker Affiliation: CBCL, Brain and Cognitive Sciences Dept., MIT

Date: April 23, 2012

Time: 1:00pm

Location: MIBR Seminar Room, 46-5193 (5th floor, MIT Bldg 46)Abstract: This talk is organized into two parts. In the first part, I will describe a computational model of biological face processing. This model is a variant of the HMAX model of object recognition, modified to use large, coarse templates. I will show that with this simple modification, the model is able to account for key characteristics of face recognition in human vision. In the more speculative second part, I will discuss the relationship between these results and computer vision algorithms for face recognition. More generally, I will discuss the prospects for biologically-inspired algorithms in attacking the problem of face recognition by computers.

## Automatically Learning the Structure of Spoken Language Without Supervision

Speaker: Aren Jansen

Speaker Affiliation: Human Language Technology Center of Excellence and Center for Language and Speech Processing, Johns Hopkins University

Date: May 7, 2012

Time: 3:00PM

Location: Patil/ Kiva Seminar Rm, Stata Bldg., MIT 32-G449Abstract: The dominant paradigm in the speech recognition community for the past four decades has been to train automatic systems with as much transcribed data we can get our greedy hands on. This strategy has led to the development of highly accurate systems that have finally found a place in our daily lives in the form of popular applications such as Apple iPhone's Siri. An unfortunate consequence of this trajectory, however, is that state-of-the-art recognition performance can only be achieved on languages and domains for which vast transcribed training resources either exist or can be easily obtained. Meanwhile, with public internet resources like YouTube and PodCasts, untranscribed speech audio is abundant and contains a wealth of hidden information regarding the acoustic-phonetic, lexical, grammatical, and semantic structure of the language being spoken. The trick is uncovering this structure automatically, an endeavor that will require new machine learning techniques, algorithms scalable to massive problem sizes, and a lot of patience. I will provide an overview of my efforts in these directions and describe some useful language- and domain-independent technologies that have been produced along the way.

Bio: Aren Jansen is a Research Scientist at the Human Language Technology Center of Excellence and an Assistant Research Professor in the Center for Language and Speech Processing, both at Johns Hopkins University. Aren received a B.A. in Physics from Cornell University in 2001. He received the M.S. degree in Physics as well as the M.S. and Ph.D. in Computer Science from the University of Chicago in 2003, 2005, and 2008, respectively. His research explores various aspects of the speech recognition problem, with a focus on whole word acoustic modeling, sparse representations and models, and unsupervised/semi-supervised learning of words and speech sounds. Lately, I have been focused on developing zero resource speech technologies that require no transcribed speech for training and are thus agnostic to the language of application. http://old-site.clsp.jhu.edu/~ajansen/

## Learning a Compact Image Code for Efficient Recognition of Novel Classes

Speaker: Lorenzo Torresani

Speaker Affiliation: Visual Learning Group, Dartmouth College

Date: April 4, 2012

Time: 12:30pm

Location: Star Seminar Room, Stata, MIT 32-D463Abstract: In this talk I will discuss methods enabling efficient object-class recognition in large image collections. We are specifically interested in scenarios where the classes to be recognized are not known in advance. The motivating application is "object-class search by example" where a user provides at query time a small set of training images defining an arbitrary novel category and the system must retrieve images belonging to this class from a large database. This application scenario poses challenging requirements on the system design: the object classifier must be learned efficiently at query time from few examples; recognition must have low computational cost with respect to the database size; finally, compact image descriptors must be used to allow storage of large collections in memory.

We propose to address these requirements by learning a compact image code optimized to yield good categorization accuracy with linear (i.e., efficient) classifiers: even when the representation is compressed to less than 300 bytes per image, linear classifiers trained on our descriptor yield accuracy matching the state-of-the-art but at orders of magnitude lower computational cost.

Bio: Lorenzo Torresani is an Assistant Professor in the Computer Science Department at Dartmouth College. He received a Laurea Degree in Computer Science with summa cum laude honors from the University of Milan (Italy) in 1996, and an M.S. and a Ph.D. in Computer Science from Stanford University in 2001 and 2005, respectively. In the past, he has worked at several industrial research labs including Microsoft Research Cambridge, Like.com, and Digital Persona. His research interests are in computer vision and machine learning. In 2001, Torresani and his coauthors received the Best Student Paper Award at the IEEE Conference On Computer Vision and Pattern Recognition (CVPR). He is the recipient of a National Science Foundation CAREER Award.

## The Simulation Engine of the Brain

Speaker: Demis Hassabis

Speaker Affiliation: Wellcome Trust Research Fellow, Gatsby Computational Neuroscience Unit, UCL

Date: Wednesday, March 28, 2012

Time: 4:00pm

Location: McGovern Seminar Room, MIT 46-3189Abstract: In daily life, people frequently imagine future events, such as how a dinner date might unfold. Such Â‘simulationsÂ’ are typically mentally played out in a rich spatial context, and often involve the presence of people and their concomitant thoughts and behaviors. It has been proposed that a common Â‘coreÂ’ brain network supports the simulation of past, future, or hypothetical experiences. I will describe a series of fMRI and patient studies exploring the neural mechanisms that underpin this simulation system, and also cover the key sub-processes involved such as scene construction and personality modeling. Finally, I will address some of the most topical theoretical issues including the adaptive advantage of such a simulation system and its intriguing connections to the latest artificial intelligence research for efficient planning in artificial agents.

## Slow Learning and Invariance

Speaker: Andreas Maurer

Date: Thursday, November 17 2011

Time: 3:00pm

Location: Star Sem. Rm D463, StataAbstract: The meaning of images observed in a natural sequence appears to evolve slowly, with comparatively few sudden changes. This principle of semantic continuity or slowness has been thought to play an important role in the formation of the mammalian visual cortex, and several machine learning algorithms have been proposed to exploit it for the learning of feature maps invariant under certain geometric transformations, such as translation, rotation or rescaling. The talk focuses on the relationship between the nature of the observed process and the class of semantic categories or invariances which can be learned from its observation using the slowness principle. Some simulations pertaining to the learning of scale invariance will also be presented.

## From Understanding Vision to New Vaccines: the Unifying Power of Mathematics

Speaker: Stephen Smale

Date: Monday, November 21, 2011

Time: 2:00pm

Location: Patil/ Kiva Seminar Rm, Stata Bldg., MIT 32-G449Stephen Smale was awarded the Fields Medal in 1966, the Veblen Prize for Geometry by the American Mathematical Society in 1966 "for his contributions to various aspects of differential topology", and the National Medal of Science in 1996 for "four decades of pioneering work on basic research questions which have led to major advances in pure and applied mathematics."

Prof. Smale was a member of the mathematics faculty, University of California, Berkeley through 1995; was appointed Professor at the Toyota Technological Institute at Chicago in 2002, and is currently a Distinguished University Professor at the City University of Hong Kong.

We are still in the process of planning talks for Fall 2011. Please check back for updates.## Perception, Action and the Information Knot that Ties Them

Speaker: Stefano Soatto

Speaker Affiliation: UCLA

Date: Sept. 30, 2011

Time: 3:00pm

Location: MIBR Seminar Room 46-3189Abstract: I will describe a notion of Information for the purpose of decision and control tasks, rooted in ideas of J. J. Gibson, and is specific to classes of tasks and nuisance factors affecting the data formation process. When such nuisances involve scaling and occlusion phenomena, as in most imaging modalities, the "Information Gap" between the maximal invariants and the minimal sufficient statistics can only be closed by exercising control on the sensing process. Thus, sensing, control and information are inextricably tied. This has consequences in understanding the so-called "signal-to-symbol barrier" problem, as well as in the analysis and design of active sensing systems. I will show applications in vision-based control, navigation, 3-D reconstruction and rendering, as well as detection, localization, recognition and categorization of objects and scenes in live video.

Speaker: Stefano Soatto is the founder and director of the UCLA Vision Lab (vision.ucla.edu). He received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Assistant and then Associate Professor of Electrical and Biomedical Engineering at Washington University, Research Associate in Applied Sciences at Harvard University, and Assistant Professor in Mathematics and Computer Science at the University of Udine, Italy. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. Dr. Soatto is the recipient of the David Marr Prize (with Y. Ma, J. Kosecka and S. Sastry) for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion (with R. Brockett). He received the National Science Foundation Career Award and the Okawa Foundation Grant. He is a Member of the Editorial Board of the International Journal of Computer Vision (IJCV), the International Journal of Mathematical Imaging and Vision (JMIV) and Foundations and Trends in Computer Graphics and Vision.

## Geometry/topology and statistical inference

Speaker: Sayan Mukherjee

Speaker Affiliation: Duke University

Date: October 4, 2011

Time: 2:30pm

Location: Patil/ Kiva Seminar Rm, Stata Bldg., MIT 32-G449In this talk I will illustrate two examples where geometric/topological ideas and statistical inference complement each other. In the first example, computational geometry is a central tool used to address a classic problem in statistics, inference of conditional dependence. In the second example, a classic object in topology and geometry, a Whitney stratified space, is stated as a mixture model and an algorithm for inference of mixture elements is provided as well as finite sample bounds for the algorithm.

The first part of the talk develops a parameterization of hyper-graphs based on the geometry of points in d-dimensions, the geometric tool here is the abstract simplicial complex. Informative prior distributions on hyper-graphs are induced through this parameterization by priors on point configurations via spatial processes. The approach combines tools from computational geometry and topology with spatial processes and offers greater control on the distribution of graph features than Erdos-Renyi random graphs.

In the second part of the talk, I describe the problem of stratification learning. Strata correspond to unions and intersections of arbitrary manifolds of possibly different dimension. We consider a mixture distribution on the strata and formulate the following learning problem: given n points sampled iid from the mixture model which points belong to the same strata. I will state a bound on the minimum number of sample points required to infer with high probability which points belong to the same strata. I will show results of this clustering procedure on real data. The clustering procedure uses tools from computational topology, specifically persistence homology.

No knowledge of geometry and topology is assumed in the talk.

## Fine-scale organization and key dimensions of visual object representation in macaque inferotemporal cortex

Speaker: Chou P. Hung, Ph.D., Assistant Professor

Speaker Affiliation: Institute of Neuroscience, National Yang-Ming Univ. Taipei, Taiwan

Date: Monday, August 15, 2011

Time: 3:15pm

Location: McGovern Seminar Room, MIT 46-3189Abstract: A major challenge in understanding visual object recognition is to decode the precise and non-linear mapping of object features, computations, and circuits in inferior temporal (IT) cortex (and its human counterpart the lateral occipital complex and fusiform gyrus), the last stage of object form processing. The factors that predict response and map variability are unclear, and this variability makes it difficult to confirm and extrapolate feature-specific rules across stimuli, cells, animals, and investigators. To relate the fine organization to feature computations and circuitry, we have developed a novel approach based on multi-electrode array recordings of spiking activity. By applying pattern classifiers and covariation analysis, we compared the object content across different maps (random, columnar, PCA, ICA) and use functional circuitry analysis and imaging (OI, human fMRI) to confirm the robustness of these key dimension maps and their reliability across subjects and species. This robustness should enable iterative refinement and build-out of the map via repeated recordings targeted to the same coalitions across animals.

## Learning from constraints

Speaker: Marco Gori

Speaker Affiliation: University of Siena

Host: Lorenzo Rosasco

Host Affiliation: Istituto Italiano di Tecnologia; CBCL, MIT

Date: Wednesday, September 14, 2011

Time: 3:00 PM

Location: Star Seminar Room, Stata Bldg, MIT 32-D463In this talk, I propose a functional framework to understand the emergence of intelligence in agents exposed to examples and knowledge granules. The theory is based on the abstract notion of constraint, which provides a representation of knowledge granules gained from the interaction with the environment. I give a picture of the “agent body” in terms of representation theorems by extending the classic framework of kernel machines in such a way to incorporate logic formalisms, like first-order logic. This is made possible by the unification of continuous and discrete computational mechanisms in the same functional framework, so as any stimulus, like supervised examples and logic predicates, is translated into a constraint. The learning, which is based on constrained variational calculus, is either guided by a parsimonious match of the constraints or by unsupervised mechanisms expressed in terms of the minimization of the entropy.

I show some experiments with different kinds of symbolic and sub-symbolic constraints, and then I give insights on the adoption of the proposed framework in computer vision. It is shown that in most interesting tasks the learning from constraints naturally leads to “deep architectures”, that emerge when following the developmental principle of focusing attention on “easy constraints”, at each stage. Interestingly, this suggests that stage-based learning, as discussed in developmental psychology, might not be primarily the outcome of biology, but it could be instead the consequence of optimization principles and complexity issues that hold regardless of the “body.”

## Efficient and Principled Learning Algorithms for Real World Problems

Speaker: Francesco Orabona, Universita degli Studi di Milano

Host: Lorenzo Rosasco

Host Affiliation: Istituto Italiano di Tecnologia; CBCL, MIT

Date: Wednesday, May 11, 2011

Time: 4:00 PM

Location: Star Seminar Room, Stata Bldg, 32-D463Abstract: Most of the research in machine learning has been directed to the problem of binary classification, given a training set and a test set acquired in a very controlled way. Even if this is a fundamental problem, still it does not fit well important real-world tasks. In this talk I will show some of the results I have presented in the literature in the last years, that aim at trying to solve interesting real-world problems with new theoretically motivated algorithms. I will present results in the IID setting and in the adversarial one. In particular, for the first setting, I will present a new algorithm for transfer learning, that automatically selects the relevant sources of prior information and uses them to bootstrap the performance in a new task with few labeled samples. For the second one, I will introduce a general framework for online learning with potential functions, and instantiations of this framework to the problem of multi kernel learning. As a last algorithm, I will talk about how to do active learning in the adversarial setting, presenting an algorithm able to work with minimal hypothesis, with a sub-linear regret bound and with a bound on the rate of queries controlled by the user.

## Learning Mixtures of Gaussians in High Dimension

Speaker: Mikhail Belkin, Ohio State University

Time: 2:00 PM

Location: Patil/ Kiva Seminar Rm, Stata Bldg., Rm G449Abstract: The study of Gaussian mixture distributions goes back to the late 19th century, when they were introduced by Pearson. Gaussian Mixtures have since become one of the most popular tools for modeling and data analysis, extensively used in speech recognition, vision and other fields, due, in part to their simple mathematical formulation. Yet their properties are still not well understood. Widely used algorithms, such as Expectation Maximization (EM) often fail even on simple artificially generated data and their theoretical properties are often unclear. In my talk I will discuss some theoretical aspects of the problem of learning Gaussian mixtures. In particular, I will discuss our recent result, which, in a certain sense, completes work on an active recent topic in theoretical computer science by establishing quite general conditions for polynomial learnability of Gaussian mixtures (as well as as a number of other distributions) in high dimension by using techniques from semi-algebraic geometry.

The talk is based on joint work with Kaushik Sinha.

## From biology to robots: the iCub project

Speaker: Giorgio Metta

Speaker Affiliation: Italian Institute of Technology (IIT). Robotics, Brain and Cognitive Sciences Department.

Host: Lorenzo Rosasco

Host Affiliation: Istituto Italiano di Tecnologia; CBCL, MIT

Date: Wed., April 13, 2011

Time: 2:00 PM

Location: Patil/ Kiva Seminar Rm, Stata Bldg., Rm G449Abstract: Simulating and getting inspiration from biology is certainly not a new endeavor in robotics (Atkeson et al., 2000; Sandini, 1997; Metta et.al. 1999). However, the use of humanoid robots as tools to study human cognitive skills it is a relatively new area of the research which fully acknowledges the importance of embodiment and the interaction with the environment for the emergence of motor skills, perception, sensorimotor coordination, and cognition (Lungarella, Metta, Pfeifer, & Sandini, 2003). The guiding philosophy - and main motivation - is that cognition cannot be hand-coded but it has to be the result of a developmental process through which the system becomes progressively more skilled and acquires the ability to understand events, contexts, and actions, initially dealing with immediate situations and increasingly acquiring a predictive capability (Vernon, Metta Sandini, 2007).

To pursue this research, a humanoid robot (iCub) has been developed as result of the collaborative project RobotCub (http://www.icub.org) supported by the European Commission through the "Cognitive Systems and Robotics" Unit E5 of IST. The iCub has been designed with the goal of studying human cognition and therefore embeds a sophisticated set of sensors providing vision, touch, proprioception, audition as well as a large number of actuators (53) providing dexterous motor abilities. The project is "open", in the sense of open-source (under GPL), to build a critical mass of research groups contributing with their ideas and algorithms to advance knowledge on human cognition (N. Nosengo 2009).

The aim of the talk is to present approach and motivation, illustrate the technology, and show results (so far!).

References:

Atkeson, C. G., Hale, J. G., Pollick, F., Riley, M., Kotosaka, S., Schaal, S., et al. (2000). Using Humanoid Robots to Study Human Behavior. IEEE Intelligent Systems, 46-56.Sandini, G. (1997, April). Artificial Systems and Neuroscience. Paper presented at the Otto and Martha Fischbeck Seminar on Active Vision, Berlin, Germany.

Sandini, G., G. Metta, and J. Konczak. Human Sensorimotor Development and Artificial Systems. in International Symposium on Artificial Intelligence, Robotics and Intellectual Human Activity Support(AIR&IHAS '97). 1997. RIKEN - Japan.

D. Vernon, G. Metta, and G. Sandini. "A Survey of Artificial Cognitive Systems: Implications for the Autonomous Development of Mental Capabilities in Computational Agents," IEEE Transactions on Evolutionary Computation, vol. 11, no. 2, pp. 151-180, 2007

N. Nosengo. "Robotics: The bot that plays ball" Nature Vol 460, 1076-1078 (2009) | doi:10.1038/4601076a

## Deconvolution of mixing time series on a graph.

Speaker: Edo Airoldi

Speaker Affiliation: Harvard University

Date: March 28, 2011

Time: 2:30 PM

Location: Stata Bldg, Seminar Rm D463 (Star)Abstract: In many applications we are interested in making inference on latent time series from indirect measurements, which are often low-dimensional projections resulting from mixing or aggregation. Positron emission tomography, super-resolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of ill-posed inverse problems, y(t)= A x(t), where the projection mechanism provides information on A. We consider problems in which A specifies mixing on a graph of times series that are bursty and sparse. We develop a multilevel state-space model for mixing times series and an efficient approach to inference. A simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel state-space model. We apply this method to the problem of estimating point-to-point traffic flows on a network from aggregate measurements. Our solution outperforms existing methods for this problem, and our two-stage approach suggests an efficient inference strategy for multilevel models of multivariate time series. We then focus on the combinatorial problem of precisely characterizing the space where the posterior distribution on x(t) given y(t) and A has positive density. The solution space has a clear geometrical structure: it is an (integral) convex polytope. We will develop three polytope samplers to make inference in ill-posed linear inverse problems, by leveraging the Hermite normal form decomposition of the matrix A. Our approach takes advantage of the geometry of the problem in two stages: (i) identify all the vertices of the solution polytope, and (ii) build a sampling distribution on the polytope, as a generalization of the Dirichlet distribution over the simplex.

## Anisotropic Voronoi Diagrams and Delaunay Triangulations

Speaker: Guillermo D. Canas

Speaker Affiliation: School of Engineering and Applied Sciences, Harvard University

Host: Lorenzo Rosasco

Host Affiliation: Istituto Italiano di Tecnologia; CBCL, MIT

Date: March 2, 2011

Time: 4:30 PM

Location: Star Sem Rm, Stata D463Abstract: Voronoi diagrams and Delaunay triangulations are used in areas as diverse as data compression, FEM simulation, computer graphics, wireless networks, and function approximation. Due to their wide application, great interest exists in anisotropic extensions (defined over Euclidean space endowed with a continuously-varying metric tensor). However, little is known on how to generalize them in a way that they remain efficient to compute and have theoretical guarantees of well-behaved-ness.

We present sufficient conditions under which the two best-known existing approximate anisotropic Voronoi diagrams are well behaved, in any number of dimensions. These conditions arise naturally in the context of optimization and approximation, and algorithms already exist that output sets of sites that satisfy them. We also show that, for a particular choice of anisotropic Voronoi diagram, in two dimensions, the well-behaved-ness of the primal anisotropic Voronoi diagram is enough to guarantee that its dual is well-behaved, resulting in an embedded triangulation that is a single-cover of the convex hull of the sites, and has other properties that naturally parallel those of ordinary Delaunay triangulations.

The end-to-end result is a simple and natural condition guaranteeing that the anisotropic triangulation of a satisfying set of vertices can be constructed by dualizing their anisotropic Voronoi diagram.

[Joint work with Steven J. Gortler.]## Sparse and Smooth: An optimal convex relaxation for high-dimensional kernel regression

Speaker: Martin Wainwright, UC Berkeley

Date: Wed., 2/23/2011

Time: 4:30PM

Location: Star Sem Rm, Stata D463

Host: Lorenzo Rosasco, Istituto Italiano di Tecnologia; CBCL, MITContact: Kathleen Sullivan, 617-253-0551, kdsulliv@mit.eduAbstract: The problem of non-parametric regression is well-known to suffer from a severe curse of dimensionality, in that the required sample size grows exponentially with the dimension $d$. Consequently, the success of any successful learning procedure in high dimensions depends some kind of low-dimensional structure. This talk focuses on non-parametric estimation within the family of sparse additive models, which consist of sums of univariate functions over $s$ unknown co-ordinates.

We derive a simple and intuitive convex relaxation for estimating sparse additive models described by reproducing kernel Hilbert spaces. The method involves solving a second-order cone program (SOCP), and so scales well to large-scale problems. Working within a high-dimensional framework that allows both the dimension $d$ and sparsity $s$ to scale, we derive convergence rates that consist of two terms: a \emph{subset selection term} that captures the difficulty of finding the unknown $s$-sized subset, and an \emph{estimation error} that captures the difficulty of estimation over kernel classes. Using information-theoretic methods, we derive matching lower bounds on the minimax risk, showing that the SOCP-based method is optimal.

Based on joint work with Garvesh Raskutti and Bin Yu, UC Berkeley

Click here for Arxiv paper.## Linear and piecewise linear data analysis

Speaker: Arthur Szlam, Courant Institute of Mathematical Sciences, NYU

Date: 2-10-2011

Time: 4:30 PM

Location: Patil/ Kiva Sem Rm 32-G449

Host: Lorenzo Rosasco, Istituto Italiano di Tecnologia; CBCL, MIT

The seminar is co-hosted by the Brains and Machines Seminar Series & Image and Computing Seminar Series.Abstract: Many data sets arising from signal processing or machine learning problems can be approximately modeled as a union of $K$ low dimensional linear sets. In this talk I will start by discussing the case $K=1$, which remains a surprisingly active area of research, despite more than a hundred years of history and a good understanding of the mathematics of the problem for many notions of ``approximately'' and ``low''. For larger values of $K$, although heuristic methods have proved successful in applications, many basic mathematical and computational questions remain open. I will talk about some regimes where we have made progress, and then give some fun examples in less easy regimes where the math remains murky.

The Brains & Machines Seminar Series 2011 is being organized by the IIT@MIT lab (a joint lab between MIT and the Italian Institute of Technology.)

## A bottom-up saliency map in the primary visual cortex for attentional guidance --- theory and experimental test

Speaker: Prof. Zhaoping Li, Dept. of Computer Science, University College London

Date: November 12, 2010

Time: 4:00PM

Location: McGovern Seminar Room 46-3189Abstract: I will introduce the physiological and behavioral data that motivated this theory, the conceptual and modeling framework of this theory, and very briefly, a model of the V1 circuit implementing the saliency computation. Furthermore, I will derive the non-trivial and surprising predictions from this theory, and show the behavoiral experiments that tested and confirmed the predictions. We will then discuss the implications of this theory for the network of attentional mechanisms in the brain, and relate them to issues such as object perception and visual awareness. More details can be seen at http://www.cs.ucl.ac.uk/staff/Zhaoping.Li/V1Saliency.html

Bio: Prof. Li obtained her B.S. in Physics in 1984 from Fudan University, Shanghai, and Ph.D. in Physics in 1989 from California Institute of Technology. Her research experience throughout the years ranges from areas in high energy physics to neurophysiology and marine biology, with most experience in understanding the brain functions in vision, olfaction, and in nonlinear neural dynamics. In late 90s and early 2000s, she proposed a theory (which is being extensively tested) that the primary visual cortex in the primate brain creates a saliency map to automatically attract visual attention to salient visual locations.

The following talk was being co-sponsored by Brains & Machines Seminar Series 2010 and Imaging and Computing Seminar (MIT Math) :## Intrinsic dimensionality estimation and multiscale geometry of data sets

Speaker: Mauro Maggioni, Duke University, Department of Mathematics

Date: Wed., May 5, 2010

Time: 4:00PM to 5:00AM

Location: 32-D463, Stata, Star Sem. Rm.Abstract: The analysis of large data sets, modeled as point clouds in high dimensional spaces, is needed in a wide variety of applications such as recommendation systems, search engines, molecular dynamics, machine learning, statistical modeling, just to name a few. Oftentimes it is claimed or assumed that many data sets, while lying in high dimensional spaces, have indeed a low-dimensional structure. It may come perhaps as a surprise that only very few, and rather sample-inefficient, algorithms exist to estimate the intrinsic dimensionality of these point clouds. We present a recent multiscale algorithm for estimating the intrinsic dimensionality of data sets, under the assumption that they are sampled from a rather tame low-dimensional object, such as a manifold, and perturbed by high dimensional noise. Under natural assumptions, this algorithm can be proven to estimate the correct dimensionality with a number of points which is merely linear in the intrinsic dimension. Experiments on synthetic and real data will be discussed. Furthermore, this algorithm opens the way to novel algorithms for exploring, visualizing, compressing and manipulating certain classes of high-dimensional point clouds.

## Active Learning, Distilled Sensing, or how to close the loop between data analysis and acquisition

Rui Castro, Columbia University, Department of Electrical Engineering

Date: Monday, April 12, 2010

Time: 4pm

Place: Stata - Sem Rm G449 (Patil/ Kiva)Abstract: Many traditional approaches to statistical inference and machine learning are passive, in the sense that all data are collected prior to analysis in a non-adaptive fashion. However, in many practical scenarios it is possible to adjust the data collection process based on information gleaned from previous observations, in the spirit of the "twenty-questions" game. Learning in such settings is known as active learning or inference using sequential experimental designs. Despite the potential to dramatically improve inference performance, analysis of such procedures is difficult, due to the complicated data dependencies created by the closed-loop observation process. These difficulties are further exasperated by the presence of measurement uncertainty or noise.

In this talk I present a quantitative analysis of active learning in a variety of scenarios, including non-parametric settings. I also present a novel selective sensing procedure - Distilled Sensing - which is highly effective for detection and estimation of high-dimensional sparse signals in noise. Large-sample analysis shows that the proposed procedure provably outperforms the best possible detection methods based on non-adaptive sensing, allowing for the detection and estimation of extremely weak signals, imperceptible without adaptive sensing.

## From Face Recognition to the Identification of Join Candidates in the Cairo Genizah

Speaker: Lior Wolf, Ph.D. , The Blavatnik School of Computer Science at Tel Aviv University

Date: Friday, February 19, 2010

Time: 4:00PM to 5:00AM

Location: PILM Seminar Room, MIT Bldg 46, Rm 3310Abstract: A join is a set of manuscript-fragments that are known to originate from the same original work. The Cairo Genizah is a collection containing approximately 250,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in libraries and private collections worldwide, and there is an ongoing effort to document and catalogue all extant fragments. The task of finding joins is currently conducted manually by experts, and presumably only a small fraction of the existing joins have been discovered. We study the problem of automatically finding candidate joins, so as to streamline the task. The proposed method is based on a combination of local descriptors and metric learning techniques. These techniques have been developed for the task of face recognition in unconstrained images, and were used to achieve the currently leading performance on a recent face recognition benchmark called Labeled Faces in the Wild. Somewhat unconventionally, the learned metrics are obtained from unlabeled training samples by repeatedly applying discriminative learning. During the talk we will describe the set of newly-discovered join-candidates that have been identified automatically and validated by human experts, and discuss the implication of our work to the study of the Genizah collection.

### MIT Intelligence Initiative (I^{2}) Events :

#### Upcoming I^{2} Seminar Series:

*We are currently organizng the Spring 2011 seminar series. Please check back for updates.*

#### Past I^{2} Seminar Series events:

## Deep learning with multiplicative interactions

Speaker: Geoffrey Hinton

Affiliation: University of Toronto, Canadian Institute for Advanced Research

Date: Tuesday, April 20, 2010

Time: 4pm

Singleton Auditorium, MIT Bldg 46

Abstract: Deep networks can be learned efficiently from unlabeled data. The layers of representation are learned one at a time using a simple learning module that has only one layer of latent variables. The values of the latent variables of one module form the data for training the next module. Although deep networks have been quite successful for tasks such as object recognition, information retrieval, and modeling motion capture data, the simple learning modules do not have multiplicative interactions which are very useful for some types of data.

The talk will show how to introduce multiplicative interactions into the basic learning module in a way that preserves the simple rules for learning and perceptual inference. The new module has a structure that is very similar to the simple cell/complex cell hierarchy that is found in visual cortex. The multiplicative interactions are useful for modeling images, image transformations and different styles of human walking. They can also be used to create generative models of spectrograms. The features learned by these generative models are excellent for phone recognition. This is joint work with Marc'Aurelio Ranzato, Graham Taylor, Roland Memisevic and George Dahl.

## On-line, Voluntary Control of Grandmother Neurons by Human Thought

Speaker: Christof Koch

California Institute of Technology

Date: Wednesday, March 3, 2010

Time: 4:00PM

Location: Singleton Auditorium, Building 46, Room 3002 *new location*

Abstract:In ongoing work with the neurosurgeon Itzhak Fried at UCLA, we record chronically from multiple single neurons in the medial temporal lobe (MTL) of patients with pharmacologically intractable epilepsy implanted with depth electrodes in order to localize the focus of seizure onsets. These neurons fire in a remarkably selective manner to different images of famous or familiar individuals and objects. These data supports a sparse, abstract, invariant and modality-independent representation in MTL, suggesting that the identity of individuals is encoded by a small number of neurons. I will describe these findings and estimate their sparseness using Bayes' rule, concluding that these cells bear some resemblance to a Grandmother Cell representation. Will discuss an unsupervised learning scheme based on sparse coding that gives rise to such cells. By feeding the firing activity of these cells back to an image display in < 100ms, we show that subjects can voluntarily, rapidly and differentially control the content of this image by focusing their thoughts specifically onto one out of four competing concepts associated with each of four simultaneously recorded MTL neurons. We show that subjects can rapidly, sometimes on the first trial, learn to regulate the firing rate of group of neurons deep inside their own brain, increasing the rate of some while simultaneously decreasing the spiking rate of others.

Special I² Lecture Series: Classification and Beyond, Prof. Shimon Ullman, Weizmann Institute

I^2 Workshop: December 4, 2009

#### CBCL WORKSHOPS:

Annual Conte Center Meeting: 9/10/07

A Journey Through Computation: 6/14-16/07

McLean-MIT Workshop: 10/3/06

Annual Conte Center Meeting: 9/11-12/06

Annual Conte Center Meeting: 8/29-30/05

Annual Conte Center Meeting: 8/30-31/04

Annual Conte Center Meeting: 8/17/03

NSF-KDI Project Workshops: 1/29/01 and 1/24-26/00

NSF-ITR Project Workshop: 1/22/01

CBCL/AI 'Learning: Brains & Machines' Workshop III: 2/13/99

CBCL/AI 'Learning: Brains & Machines' Workshop II: 9/26/98

CBCL/AI 'Learning: Brains & Machines' Workshop I: 1/28/97

**OTHER SEMINARS, WORKSHOPS, CONFERENCES & SYMPOSIUMS:**

FoCM Conference, University of Cantabria, Santander, Spain: 6/30-7/9/05

FoCM Workshop 4 - Learning Theory (July 4-6th)

IPAM Workshop Series II, Los Angeles, CA - Inverse Problems (Computational Methods and Emerging Applications): 11/12-20/03

IPAM Workshop Series II: Agenda

FoCM Conference, Minneapolis, Minnesota, USA: 8/5-14/02

FoCM Workshop 4 - Learning Theory (August 5-7th)

NATO Advanced Study Institute on Learning Theory and Practice: 7/8-19/02

NSF-Europe Workshop - NIPS 2001 Workshop on Machine Learning Methods for Text and Images: 12/8/01