The Science of Intelligence

Fall 2019 – MIT

Project 1. Few-shot Bayesian Imitation Learning with Logical Program Policies


Prof. Leslie Kaelbling: lpk@csail.mit.edu             lab website: https://people.csail.mit.edu/lpk/

Mentors:Tom Silver: tomssilver@gmail.com, Kelsey Allen: krallen@mit.edu

Slides: https://docs.google.com/presentation/d/1Wq3Ukdt1To8EKME6J1rpCLxF0kgWyXUY-MI_wV9efhk/edit#slide=id.p

Description:

People learn strategies for everyday tasks like ironing a shirt or brewing a cup of coffee from one or a few demonstrations. We are interested in building artificial agents (first simple simulated ones, but someday robots) with the same capacity. In recent work, we proposed a class of policies called Logical Program Policies (LPP) and an algorithm for efficiently learning these policies from very few (1-5) demonstrations. There are several subsequent projects that we would be excited to pursue with students in 9.58: 

  1. Human studies: This is a project for students interested in human modeling. While there has been a large body of work investigating how humans watch and learn from others empirically, there is relatively little work aimed at computationally modeling this behavior, particularly in richer settings such as those that can be addressed with LPP. Students will design experiments to understand whether LPP is a good model of human imitation learning in these settings, and if not, what the major differences are. 
  2.  Representation learning: In this project, students will focus on combining logical program policies with representation learning. A new hot topic in AI and deep learning focuses around “neuro-symbolic” architectures which combine logic and neural networks for joint reasoning and representation. An initial approach could use gradient estimators such as REINFORCE to propagate gradients from LPP to a neural network in order to learn representations that work well for logic-based reasoning.
  3. Meta-learning: Meta-learning (or learning to learn) is an increasingly important topic in AI. Meta-learning supposes that agents will learn good priors for tasks drawn from some distribution, such that they can quickly adapt their representations or policies to any specific task from that distribution with just a few interactions. In this project, students will focus on incorporating meta-learning into LPP by learning priors over programs which are well adapted to a specific set of tasks. The goal would be to show dramatic improvement in search efficiency for test tasks drawn from the same task distribution, as well as intuitively reasonable learned priors.
  4. Trial and error learning: In its current form, LPP simply infers the policy of an expert, and then executes that policy in a new task. In reality, learning from imitation does not proceed so smoothly: we have an initial idea for what we might do based on an expert demonstration, but then might have to try a few different things (and learn from them) in order to settle on a good policy. This project will investigate incorporating trial-and-error learning into LPP through a mixture of model-based and model-free reinforcement learning, using priors obtained from the expert demonstrator.

References:

The main paper introducing logical program policies is https://arxiv.org/abs/1904.06317

Human Studies: https://arxiv.org/abs/1807.07134

Representation learning: https://arxiv.org/abs/1905.10307https://arxiv.org/abs/1902.08093

Meta-learning:  https://arxiv.org/abs/1703.03400

Trial and error learning: https://arxiv.org/abs/1906.03352https://arxiv.org/abs/1704.03732 

 

 

Project 2. Learning Discriminative Feature Relationships for Analogous Video Understanding


Prof. Aude Oliva: oliva@mit.edu                       lab website: http://moments.csail.mit.edu/

Mentor: Mathew Monfort: mmonfort@mit.edu

Presentation slides: https://docs.google.com/presentation/d/10zniDwOF1TWsS_Qobr3e7tv-T8datEGDyx0qFv1Tbbo/edit?ts=5d6ff44b#slide=id.g41c712b9f2_0_0

Description:

The goal of the project is to train a siamese neural network to learn discriminative feature representations that can be used to identify relationships, and differences, between pairs of videos. The desired result being a model with larger class boundaries at inference time and the ability to perform analogous video retrieval beyond the performance of a standard model trained for classification.

References:

https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdfhttp://moments.csail.mit.edu/TPAMI.2019.2901464.pdf

Project 3. Studying face processing in the brain with artificial neural network models


Prof. Jim DiCarlo: dicarlo@mit.edu                       lab website: http://dicarlolab.mit.edu/

Mentor: Tiago Marques tmarques@mit.edu

Presentation slides: https://drive.google.com/file/d/1SprvaP_bMgFj503T-PiZL4B7AO9aLPLv/view?usp=sharing

Description:

Human visual abilities are extremely complex from a computational point of view. For over half a century, neuroscientists have been studying how hierarchical processing along the primate ventral stream produces the complex visual representations in the inferior temporal (IT) cortex that support object recognition. Neuronal responses in this area are selective for high-level object features and invariant to transformations that do not affect the object identity. In the IT face patches, neurons are selective for faces, i.e., they respond strongly to faces, independently of position, contrast, scale, and orientation, and weakly to other objects. Moreover, it was recently reported that these neurons are remarkably tuned, projecting incoming faces onto specific axes of a high-dimensional space, encompassing face appearance and shape features.

Recently, deep artificial neural networks (ANNs), a class of brain-inspired computational algorithms that incorporate some key features from cortical circuits, achieved levels of human performance in standard object recognition tasks. This and the biological inspiration of their internal architecture make ANNs an important tool to study visual processing in the brain. Currently, ANNs are the best model class in predicting neuronal activity in several visual ventral areas, particularly IT.

The main goal of this project is to study the complex responses of IT face-selective neurons, using ANNs as models of primate vision. This will be achieved by pursuing the following aims:

  1. Implement the analyses described in [1] to measure face tuning in ANN models.
  2. Optimize a family of biologically-inspired recurrent ANNs, CORnet [2], to match the distribution of feature preference of neurons in IT face patches.

If successful, this project will lead to a model of visual processing that implements a similar face code to that of primate IT, pointing the way to a deeper understanding of the mechanisms underlying visual perception in the brain.

In addition to the scientific outcome, students in this project will learn important computational neuroscience approaches, such as detailed quantitative analyses of neuronal responses, and using ANNs for studying primate visual processing.

References:

https://www.cell.com/cell/fulltext/S0092-8674(17)30538-X

https://www.biorxiv.org/content/10.1101/408385v1

Project 4. ObjectNet


Prof. Boris Katz: boris@csail.mit.edu                       lab website: https://www.csail.mit.edu/person/boris-katz

Mentor: Andrei Barbu abarbu@mit.edu

Presentation: https://drive.google.com/file/d/1-43O9XISSMuT4sON6XfnO80A3Y-Uo4FH/view?usp=sharing

Description:

Students will work on one of the following problems:

  1. How good arew humans and machines at object recognition?
  2. How is language represented in the brain?

Project 5. Systematic studies about the role of context in object recognition


Prof. Gabriel Kreiman: gabriel.kreiman@tch.harvard.edu    lab website: http://kreiman.hms.harvard.edu/

Mentor: Mengmi Zhang Mengmi.Zhang@childrens.harvard.edu

Presentation: https://drive.google.com/file/d/1eJTkW-2xawXWc897nxwzl4SHgUiXJ8H6/view?usp=sharing

Description:

The small object next to a computer keyboard is most likely to be a computer mouse, not an elephant. In the real world, objects often co-vary with other objects and particular environments. In the project, we have two primary goals. The first is to develop an understanding for the brain’s ability to process and comprehend visual input through a biological and quantified explanation. Second, we further hope to utilize this understanding towards building a robust computational model, which would contribute to enabling current technology to identify visual cues more meticulously. 

There are eight main experiments that we have tested and plan to test on both humans and machine learning algorithms with regards to object recognition, including (a) varying amounts of context shown, (b) backward masking, (c) blurred context, (d) blurred object, (e) scrambling contextual information, (f) guessing objects from contextual information, (g) finding the effects of time on recognition, and (h) swapping context. Each of these experiments are run in-lab using eyetrackers and online through Amazon’s Mechanical Turk.

In the project, students can obtain hands-on experiences in designing, developing and implementing psychophysics experiments and acquire skills of using eyetrackers, coding psyschophysics experiments using MATLAB psychotoolbox, and coding online Amazon’s Mechanical Turk experiments using HTML and javascript. Meanwhile, students can also get to develop computational models for object recognition with various machine learning techniques, such as deep learning, on state-of-the-art machine learning platforms, such as tensorflow and pytorch. Moreover, students also have the opportunity to hone their critical thinking ability in analyzing scientific results and drawing insightful conclusions based on experimental results.

References: 

https://arxiv.org/pdf/1902.00163.pdf and http://cvcl.mit.edu/Papers/OlivaTorralbaTICS2007.pdf

Project 6. Developing behavioral tasks and testing models of nonverbal visual reasoning in primates


Prof. Jim DiCarlo: dicarlo@mit.edu                       lab website: http://dicarlolab.mit.edu/

Mentor: Kohitij Kar kohitij@mit.edu

Presentation slides: https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3Ad23d1491-e726-47ab-a9ea-b32a8e759aee

Description:

Accurate interpretation of a scene (“what objects are present in a scene?” [1] or “what happened?” ) is not the same as making accurate predictions (“what might happen next?” [2]) or generalizations based on that interpretation. The ability to create abstractions from knowledge representations is one of the hallmarks of human intelligence. This project will initially focus on developing non-verbal visual inductive reasoning tasks which comprises of drawing inferences from a set of visual observations to make broad generalizations, e.g. identifying causal relationships between agents/objects in a scene, recognizing visual sequences, recognizing the outcomes of one’s own action in a scene and modifying behavior in a similar future situation etc. Although much work has been done in cognitive psychology to study reasoning (in general), very few studies have expanded this domain into systems neuroscience, especially in the non-human primates.

This project will likely address two key issues:

  1. When do humans and monkeys share inductive reasoning patterns? The answer to this question is critical to establish when we can use rhesus macaques as a good model of human reasoning; especially if we are to make inferences on humans based on circuit-level neural experiments in macaques. Given that this field is very new, the project will primarily focus on developing new tasks that quantify the varying degree of generalization made by humans when they observe multiple visual scenarios.
  2. What are the computational models of reasoning that best explain primate inductive reasoning? We will begin our model search by testing the recently published image- computable neural network models (e.g. relational networks [3,4]) that are exclusively built to solve such visual association tasks. Given the potential behavioral results (obtained above), we will then screen these models for their prediction accuracy. 

Skills to be learned:

  • Programming in Javascript to run online amazon mechanical turk experiments.
  • Programming in MATLAB (preferred) or Python to analyze the behavioral data.
  •  Brainstorming sessions on how to start thinking about novel task design and building large behavioral datasets.
  • Second-hand experience of learning how rhesus macaques are trained for various tasks and thinking about tasks that can be jointly used for humans and macaques.
  • We will work with predictions of both state-of-the-art deep convolutional neural networks aswell as relational networks.
  • Testing models on our behavioral tasks (designed from goal 1 above) and ranking their performances [analyses in MATLAB or Python].

References:

[1] https://dx.doi.org/10.1016%2Fj.neuron.2012.01.010

[2] https://arxiv.org/abs/1504.08023

[3] https://arxiv.org/abs/1706.01427

[4] https://arxiv.org/abs/1807.04225

Project 7. Discovering features of object representations using GAN-generated stimuli


Prof. Jim DiCarlo: dicarlo@mit.edu                       lab website: http://dicarlolab.mit.edu/

Mentor: Kamila Joźwik jozwik.kamila@gmail.com

Presentation slides: https://drive.google.com/file/d/1hmkEEfG3t5gVkTqLnbA41u-jnAiUVSec/view?usp=sharing

Description:

Perceiving the world around us through senses, including vision, is vital for our existence. Object recognition plays a crucial role. For example, we need to recognize an oncoming car while crossing the street to avoid danger or recognize a face of a friend when we enter a café. However, we still do not understand how we can accomplish this feat.

Specifically, it is not clear what features of objects are used by humans to categorize objects.  We discovered some of the features and categories that help us recognize objects and compared them to the features from computational models [1]. However, it is likely we don’t sample the full feature space of the object representations using natural stimuli.

To address this concern, I propose to use state-of-the-art models — Generative Adversarial Networks (GANs) — to generate stimuli from different categories and their fused versions. We will use BigGan [2].

We will show these model-generated stimuli to humans to access their behavior (similarity judgments task). Similarity judgments is a task where subjects need to arrange objects according to their similarity. Subjects will judge the objects based on general similarity and separately using other criteria e.g., animacy, agency.

We will test whether the perception of these stimuli in humans and state-of-the-art Deep Neural Networks (DNNs) is similar using a wide range of off-the-shelf DNNs (over 30 models, [3]) and biologically-inspired DNNs (including topographical and recurrent models).

Research questions:

  • To what extent are representations in Generative Adversarial Networks similar to that of humans?
    • How is the perceptual space of GAN-generated object-fused stimuli reflected in human similarity judgments?
    • At what point do subjects decide that a given object belongs to a given category?
    • Where is the line between animate and inanimate objects?
  • Which Deep Neural Networks can best predict human similarity judgments of GAN-generated object-fused stimuli?

Skills to be learned:

  • Generating images with Generative Adversarial Networks
  • Collecting on-line human behavioral data
  • Extracting activations from Deep Neural Networks
  • Comparing human behavioral data with DNNs using Representational Similarity Analysis

This project will help us to discover the features in humans that allow us to recognize objects and have a visual experience of the surrounding world.

References:

[1] https://www.frontiersin.org/articles/10.3389/fpsyg.2017.01726/full

[2] https://arxiv.org/abs/1809.11096

[3] https://www.biorxiv.org/content/10.1101/407007v1