Vision and Reasoning
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Cognitive Sciences and Active Vision literature points to an explicit iterative interaction among perception, reasoning, and memory (knowledge) modules (DeepIU ACS 2015).
Ongoing Projects
- SERB DST Startup Research Grant (2021-23) ~ INR 26 Lacs | Topic: “Learning from Rules and Data for Image Analytics”
- IIT Kharagpur Faculty Startup Research Grant (2022-24) ~ INR 25 Lacs
Topic: The Role of Feedback in Vision-Language enabled Embodied Agents towards Applications in Desire Management
Joint PI: Prof. Pawan Goyal - Counterfactual Reasoning in Videos
- Active Learning for 3D Video Grounding (with Dr. Maneesh Singh)
Captioning
In our earliest attempt (CVIU 2017), we used a combination of image classification, reasoning with commonsense knowledge (extracted from training captions) to propose a Scene Description Graph as an intermediate representation for a natural image. We showed the efficacy of this representation through image captioning, image retrieval tasks (and QA case studies).Visual QA, Image Puzzles and Visual Reasoning
We have proposed instantiations of this abstract architecture to solve image puzzles, VQA and visual reasoning tasks such as CLEVR. In our AAAI 2018 VQA, and UAI 2018 Puzzles work, we have proposed an explicit probabilistic soft logic layer on top of a neural architecture that helps integrate commonsense knowledge and induces post-hoc interpretability.Later on, for an end-to-end (differentiable) integration of spatial knowledge, we explore a combination of knowledge distillation, probabilistic logic, and relational network in our WACV 2019 CLEVR.
Publications
Spatial Knowledge Distillation to aid Visual Reasoning |
In IEEE WACV 2019.
(2019).
(2019).