The Science of Intelligence

Fall 2020 – MIT

List of Project Mentors (List Currently being updated)

Pramod R.T. (Kanwisher Lab)

Mengjia Xu (Pantazis Lab)

Jenelle Feather (McDermott Lab)

Thomas O’ Connell (Kanwisher Lab & Tenenbaum Lab)

Brian Cheung (BCS Fellow in Computation, Agrawal Lab)

Tiago Marques (DiCarlo Lab)

Berhard Egger (Tenenbaum Lab)

Akshay Rangamani (Poggio Lab)

Project Descriptions


Comparing Human Vision with AI Visual Systems

Project Lead / Mentor : Tiago Marques

Current state-of-the-art visual artificial intelligence (AI) systems are based on convolutional neural networks (CNNs), which are loosely guided by the internal architecture and functional properties of the primate visual system. Moreover, CNNs are currently the leading model class of the neural mechanisms of visual processing: intermediate layers can partly account for how neurons in the brain’s visual system respond to any given image, and these models also partly predict human object recognition behavior. However, current CNNs have several significant limitations when compared to human visual abilities. In particular, they can be fooled by imperceptibly small, explicitly crafted perturbations, and struggle to recognize objects in corrupted images that are easily recognized by humans.

In which precise aspects do CNNs deviate from human visual abilities? Recently, there have been several studies comparing CNNs with human object recognition behavior. Unfortunately, individual studies evaluate different models and employ different metrics for comparing CNNs to human visual behavior, making it difficult to determine precisely in which aspects human and artificial vision deviate from each other. 

The main goal of this project is to consolidate a wide range of behavioral comparisons between CNN models and humans to better inform us to which extent different neural network architectures approximate human vision. The student will take advantage of a unique, neuroscience-based framework developed in the DiCarlo lab — Brain-Score — to make precise and systematic comparisons between a large pool of artificial neural networks and the primate visual system. This will be achieved by a combination of the following aims:

  1. Consolidate data from existing experiments into new standardized behavioral benchmarks that evaluate precisely in which aspects CNN models depart from Human Vision
  2. Perform a new set of psychophysical experiments using natural and synthetic stimuli to evaluate Human Vision and compare it with state-of-the-art CNN models.

If successful, this project will lead to a better understanding of the limitations of current visual AI systems, and provide guidance for improving current architectures. This will result in new models that more closely approximate biological vision and can perform in complex real-world applications.

References:

Rajalingham, Issa, et al., Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks https://www.jneurosci.org/content/38/33/7255

Geirhos et al. Generalisation in humans and deep neural networks https://papers.nips.cc/paper/7982-generalisation-in-humans-and-deep-neural-networks

Elsayed et al. Adversarial Examples that Fool both Computer Vision and Time-Limited Humans http://papers.nips.cc/paper/7647-adversarial-examples-that-fool-both-computer-vision-and-time-limited-humans

Geirhos, Meding, and Wichmann. Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency https://arxiv.org/abs/2006.16736

Slides

—-

CORnet for Inverse Graphics (CORnet-IG)

Project Lead / Mentor : Bernhard Egger

There is an ongoing debate how human visual processing can be best modeled. CORnet (Kubilius, Schrimpf 2018) represents the current best model for the primate ventral visual processing stream for core object recognition. For face perception however an Efficient Inverse Graphics (EIG, Yildirim 2020) approach better reflects face processing – however the proposed approach does not scale beyond faces and can therefore not directly be compared to CORnet. In this project we would like to explore the idea of enforcing an Inverse Graphics motivated representation in a CORnet architecture. We will add approximate depth and normal information to the training data and enforce during training that those are encoded and can be recovered with a simple convolutional decoder from the IT-layer of CORnet. This would enforce the model to learn a rich representation which goes beyond a simple classification task. The motivation of this idea goes back to a 2.5D sketch representation as proposed by Marr. This would be a first Inverse Graphics type approach that could be compared on a benchmark like Brain-Score (https://www.brain-score.org/).

——

Learning Graph Embeddings for Network Analysis

Project Lead / Mentor: Mengjia Xu

Project description: Unlike the grid-like Euclidean data (e.g., images, audios, natural languages, etc.), numerous real-world data are connected and exist in complex irregular domains (e.g., bank-asset networks in finance, protein-protein interaction networks in drug discovery, social networks, brain networks in neuroscience, etc.). Graphs can provide an effective way and universal language for describing and modeling complex systems. Recently, learning latent graph embeddings with deep neural networks [1-3] has attracted a lot of attention, and has demonstrated great potential for projecting high-dimensional graphs into low-dimensional continuous vectors or probability density spaces, in an inductive and unsupervised manner. The obtained latent space embeddings can be looked as latent features, which can be readily and efficiently input to traditional classifiers for variant downstream graph analytics tasks. Moreover, the obtained probability density embeddings encode additional useful uncertainty information, which facilitates a better quantitative analysis for different nodes’ properties in the latent space.

There are several possible problems that we are expecting to pursue with the students in the project:

    1. Develop a deep neural network model to learn vector-based graph embeddings for downstream network analysis tasks.
    • Build graphs (directed/undirected/attributed/weighted) based on collected datasets;
    • Map graphs into low-dimensional vector representations using a deep neural network-based encoder;
    • Evaluate the output embeddings with prepared downstream tasks (classification, link prediction or graph reconstruction) and compare with other SOTA methods.

    2. Develop a deep neural network model to learn probabilistic graph embeddings for downstream network analysis tasks.
    • Build graphs (directed/undirected/attributed/weighted) based on collected datasets;
    • Map graphs into low-dimensional distribution-based representations using a deep neural network-based encoder;
    • Evaluate the output embeddings with prepared downstream tasks (classification, link prediction or graph reconstruction) and compare with other SOTA methods.

Skills to be learned:
    • Hands-on experience in designing and developing deep neural network-based graph embedding models for graph data analysis;
    • Acquire skills in basic graph-relevant techniques, e.g., graph basic properties, graph structure representations, and node neighbor sampling, graph visualization, etc.;
    • Develop critical thinking ability in analyzing the scientific results and drawing insightful conclusions based on experimental results.

References:

    [1] Vector-based graph embedding:
https://dl.acm.org/doi/abs/10.1145/2939672.2939754?casa_token=j1AMxmiMzpUAAAAA:gs0lSP8ZalE4fu-5ojXbpkPDk6Uu6aad0fHgQf86RqqkcvSyXsevg3bD37yrRTkkBjvWR40WZRP9UA
    [2] Probabilistic graph embedding: https://arxiv.org/abs/1707.03815
    [3] Overview paper of graph embedding on biomedical networks:
https://academic.oup.com/bioinformatics/article/36/4/1241/5581350

Slides

——

Supervised Learning with Brain Assemblies

Project Lead / Mentor : Akshay Rangamani

Assemblies of Neurons have recently seen a resurgence as a computational model between the level of individual neurons and whole brain models. While an assembly calculus with a biologically plausible realization has been explored, we propose to construct and analyze a computational learning model based on neuronal assemblies. We aim to develop a learning system that is as powerful as the current Deep Learning (DL) techniques, and to analyze properties like robustness and interpretability which are part of the shortcomings of DL.

Possible directions:

1. Construct Neural Assembly Networks that are capable of performing supervised learning tasks like classification. First show that one can successfully perform binary classification, and then explore extensions to multi-class classification problems. Similar to DL, we would also like to explore hierarchical architectures in Neural Assembly Networks, with computations being passed down many layers to solve more complicated learning tasks.These models inspired by higher order cognition should be particularly well suited to work on symbolic and statistical learning tasks.
2. Understand the proposed learning algorithm in an analytical manner and provide mathematical guarantees on the convergence of the Hebbian learning algorithm to solutions that generalize well.
3. Explore how Neural Assembly Networks can be robust to distribution shifts in the data, as well as attacks from adversaries.

References:

Papadimitriou, C. H., Vempala, S. S., Mitropolsky, D., Collins, M., & Maass, W. (2020). Brain computation by assemblies of neurons. *Proceedings of the National Academy of Sciences*.

Papadimitriou, C. H., & Vempala, S. S. (2018). Random projection in the brain and computation with assemblies of neurons. In *10th Innovations in Theoretical Computer Science Conference (ITCS 2019)*. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

Legenstein, R., Papadimitriou, C. H., Vempala, S., & Maass, W. (2016). Assembly pointers for variable binding in networks of spiking neurons. *arXiv* *preprint arXiv:1611.03698*.

Dasgupta, S., Stevens, C. F., & Navlakha, S. (2017). A neural algorithm for a fundamental computing problem. *Science*, *358*(6364), 793-796.

Rangamani, A., Gandhi, A., (2020) Supervised Learning with Brain Assemblies *(Preprint)*

Slides

—–

Eigendistoritions of Auditory Models

Project Lead / Mentor : Jenelle Feather

Human sensory systems are often modeled as a cascade of operations, where each stage in the sensory system applies some transformation to the input before passing it on to a next stage. The stages of learned transformations in task-optimized convolutional neural networks outperform traditional hand-engineered models of the auditory system when predicting neural activity, suggesting that the features learned by the models are similar to the space used by human sensory systems [1]. However, there are clear discrepancies between human perception and the network representations as demonstrated by model metamers [2] and adversarial examples [3]. Another way to formalize comparisons between the network features and human sensory systems is to generate an input that will maximally or minimally distort the model representation [4]. These stimuli could be used as a way to compare human perceptual sensitivity to distortions to the network sensitivity to distortions. In this project, we will synthesize audio that corresponds to eigenvectors of the Fisher Information Matrix of a layer of a neural network trained on an auditory task. We will further explore the distortion eigenvectors obtained when including a neural mapping to human fMRI data collected on a diverse set of auditory sounds. The project will include generating the synthetic stimuli and testing human detection of the distortions via psychophysics experiments on Amazon Mechanical Turk. Prior work with a deep learning framework (such as pytorch) will be beneficial for generating the eigendistoritions, while experience with javascript and conducting psychophysics will be useful for evaluating the distortions.

References: 

[1] Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3), 630-644.

[2] Feather, J., Durango, A., Gonzalez, R., & McDermott, J. (2019). Metamers of neural networks reveal divergence from human perceptual systems. In Advances in Neural Information Processing Systems (pp. 10078-10089).

[3] Carlini, N., & Wagner, D. (2018, May). Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW) (pp. 1-7). IEEE.

[4] Berardino, A., Laparra, V., Ballé, J., & Simoncelli, E. (2017). Eigen-distortions of hierarchical representations. In Advances in neural information processing systems (pp. 3530-3539).

—-

Linking Brains, Behavior, and AI using Neural Reconstruction

Project Lead / Mentor: Thomas O’Connell

The past 10 years have shown great advances in modeling biological visual processing, thanks to the advent of artificial neural networks (ANNs). Intermediate representations in ANNs optimized for object recognition explain visually-evoked variance along the primate ventral stream and in primate behavioral categorization judgements (Yamins and DiCarlo, 2016, Richards et al. 2019). However, the fit between ANNs and brains/behavior is usually done in isolation. The gold-standard for cognitive neuroscience is demonstrating a computational model that captures behaviorally-relevant variance in neural responses within a single experimental paradigm. Recently, we demonstrated such an approach for ANNs and eye movement behavior using neural reconstruction from fMRI in humans (O’Connell and Chun, 2018). In this approach, spatial priority maps derived from ANN models of visual recognition are reconstructed from patterns of brain activity in visual brain regions. These reconstructed priority maps are then used as predictions for eye movement patterns in the same participants. Reconstructed priority maps predict eye movement patterns within and across individuals, demonstrating that ANNs and human brains share variance related to allocation of eye movements. Two potential follow-up projects (for one to two students) stem from this work:

  1. Benchmarking using Brain-Score set of models. For this project, the neural reconstruction approach described above will be used to assess and benchmark a wide range of ANN models. As in Brain-Score (https://www.brain-score.org), we will assess how well many different ANNs capture joint variance in neural responses and eye movements towards the same images. Benchmarking will first be done on eye movement datasets, and then will progress to one of three available joint fMRI and eye movement datasets.
  2. What ANN attributes affect neural eye movement predictivity? While the above project takes a more naturalistic approach by screening many existing ANN models, this project will directly manipulate different attributes of ANNs to determine how those attributes contribute to the degree of joint shared variance with neural responses and eye movement behavior. Relevant ANN attributes to be explored include 1.) Network architecture (e.g. feedforward, recurrent, # of layers), 2.) Objective function (e.g. supervised, self-supervised, unsupervised, inverse graphics), and 3.) training diet (e.g. ImageNet, Places365, SAYcam, “at-birth” random untrained networks).

For both projects, interested students should have experience with ANN modeling in Python (PyTorch preferred). Interested students should also have a strong interest in cognitive (neuro)science and learning to work with behavioral and brain datasets (but no previous experience necessary).

—–

Title: Intuitive Physics in Minds and Machines

Project Lead / Mentor: Pramod RT

Abstract: Understanding, predicting and acting on the world requires an intuitive knowledge of physics. For example, we make judgements about whether the stack of dishes in the kitchen sink are stable or not, whether it is safe to step on a ladder, or whether it is safe to walk on a puddled road. As ubiquitous as it is, we still do not understand how our brain represents and builds this core knowledge of intuitive physical scene understanding. In this project, we will explore how an important facet of intuitive physics, namely, physical object relational attributes (like stability, attachment, containment etc) are represented in minds — by measuring human behavior, and in machines — by building and test various artificial neural network models. This project also has potential for future neuroimaging studies.

References:

1. Ullman, T. D., Spelke, E., Battaglia, P., & Tenenbaum, J. B. (2017). Mind Games: Game Engines as an Architecture for Intuitive Physics. Trends in Cognitive Sciences, 21(9), 649–665.

2. Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences of the United States of America, 110(45), 18327–18332.

3. Lerer, A., Gross, S., & Fergus, R. (2016). Learning physical intuition of block towers by example. 33rd International Conference on Machine Learning, ICML 20161, 648–656.

4. Fischer, J., Mikhael, J. G., Tenenbaum, J. B., & Kanwisher, N. (2016). Functional neuroanatomy of intuitive physical inference. Proceedings of the National Academy of Sciences113(34), E5072–E5081. https://doi.org/10.1073/pnas.1610344113

Ecological Learning: Building machines that learn naturally

Project Lead / Mentor : Brian Cheung

While current AI learning algorithms have already demonstrated compelling abilities in using data to accomplish well-defined tasks and objectives, there are still many aspects that distinguish it from a more ideal system. These algorithms still require tedious curation and manual specification of the environment and its corresponding learning signal. To relax these constraints, we must bring artificial learning systems closer to the counterparts that we observe in nature. We must understand how learning operates in a more natural environment.

Human development at the earliest stages of infancy tends to be an autonomous process, not a carefully supervised one like the artificial models trained today. The goal of this project is to develop a framework for serving data to AI systems more akin to how a naturally intelligent system like an infant would experience the world; online, interactive and temporally consistent. If successful, the project should drastically increase the scope of data available to AI models today. It will also serve as a framework for evaluating the abilities of state-of-the-art learning algorithms and their potential weaknesses in such domains.