Vision and Reasoning

Nov 8, 2015

Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Cognitive Sciences and Active Vision literature points to an explicit iterative interaction among perception, reasoning, and memory (knowledge) modules (DeepIU ACS 2015).

Ongoing Projects

SERB DST Startup Research Grant (2021-23) ~ INR 26 Lacs | Topic: “Learning from Rules and Data for Image Analytics”
IIT Kharagpur Faculty Startup Research Grant (2022-24) ~ INR 25 Lacs
Topic: The Role of Feedback in Vision-Language enabled Embodied Agents towards Applications in Desire Management
Joint PI: Prof. Pawan Goyal
Counterfactual Reasoning in Videos
Active Learning for 3D Video Grounding (with Dr. Maneesh Singh)

Captioning

In our earliest attempt (CVIU 2017), we used a combination of image classification, reasoning with commonsense knowledge (extracted from training captions) to propose a Scene Description Graph as an intermediate representation for a natural image. We showed the efficacy of this representation through image captioning, image retrieval tasks (and QA case studies).

Visual QA, Image Puzzles and Visual Reasoning

We have proposed instantiations of this abstract architecture to solve image puzzles, VQA and visual reasoning tasks such as CLEVR. In our AAAI 2018 VQA, and UAI 2018 Puzzles work, we have proposed an explicit probabilistic soft logic layer on top of a neural architecture that helps integrate commonsense knowledge and induces post-hoc interpretability.

Later on, for an end-to-end (differentiable) integration of spatial knowledge, we explore a combination of knowledge distillation, probabilistic logic, and relational network in our WACV 2019 CLEVR.

Somak Aditya

Assistant Professor

My research interests include integrating knowledge and enabling higher-order reasoning in AI.

Publications

ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments | In EMNLP 2024 (Main).
Sourjyadip Ray, Kushal Gupta, Soumi Kundu, Dr Payal Arvind Kasat, Somak Aditya, Pawan Goyal (2024).

PDF nlp vision

Text2Afford: Probing Object Affordance Prediction abilities of Language Models solely from Text | In CONLL 2024 (Main).
Sayantan Adak, Daivik Agarwal, Animesh Mukherjee, Somak Aditya (2024).

PDF nlp vision

Integrating Knowledge and Reasoning in Image Understanding | In IJCAI 2019.
Somak Aditya, Yezhou Yang, Chitta Baral (2019).

PDF vision nlp

Knowledge and Reasoning for Image Understanding | In Ph.D Dissertation, Defended 2018.
Somak Aditya (2019).

PDF vision nlp

Spatial Knowledge Distillation to aid Visual Reasoning | In IEEE WACV 2019.
Somak Aditya, Rudra Saha, Yezhou Yang, Chitta Baral (2019).

PDF vision nlp neurosymbolic

Explicit Reasoning over End-to-End Neural Architectures | In AAAI 2018.
Somak Aditya, Yezhou Yang, Chitta Baral (2018).

PDF Code Project vision nlp neurosymbolic

Visual common-sense for scene understanding using perception, semantic parsing and reasoning. | In AAAI Spring Symposium, 2015.
Somak Aditya, Yezhou Yang, Chitta Baral, Cornelia Fermuller, Yiannis Aloimonos (2015).

PDF Slides vision nlp