Vision and Language

Spatial Knowledge Distillation to aid Visual Reasoning

In this work, we propose a way to integrate spatial commonsense knowledge to aid visual reasoning `external_link`.

Solving Visual QA using Reasoning and Deep Learning

In this work, we propose an exlicit reasoning layer over Deep Neural Architectures to solve VQA. `external_link`.

Solving Image Puzzles

In this work, we propose image puzzles that require background knowledge to solve. This project is also aimed to advocates logically interpretable systems. `external_link`.

From Images to Sentences

In this project, we aim to achieve a deeper understanding of images and videos with the help of background or common-sense knowledge. Our motivation is the way a human interprets images and videos, breaks them down in few important constituent parts, can reason about them, detect complex activities etc. `external_link`.