CS60092 Information Retrieval (Spring 2022)

Instructor     Somak Aditya
Timing     Slot A3: MON (08:00-08:55) , MON (09:00-09:55) , TUE(12:00-12:55)
First Class     January 10th MON (08:00-08:55) , MON (09:00-09:55)
Venue     Online Teams Channel Information-Retrieval-2022Spring, Key g7tgjqc
Teaching Assistants     Abhilash Nandy
Ankan Mullick
Neeraj Saini
Ravi Pratap Singh
Vaibhav Saxena

Announcements

  • [Mar 28] Dr. Monojit Choudhury, Principal Data and Applied Scientist at Turing India (Microsoft) presented a Guest Lecture on "Computing and Representing the Meanings of Words: From Wittgenstein to GPT-3 and beyond". The recording is available here (Google Drive 309.4 MB MPEG4).
  • [Mar 17] Class Test 2 will be on Mar 3rd week. Syllabus will include whatever is covered after Vector Space and Scoring.
  • [Jan 27] Class Test 1 will be on Feb 3rd week. Syllabus will include whatever is covered upto Feb 1st week. Please register in the CSE Moodle. Enrollment key is shared on Teams
  • [Jan 27] Project Choice submission deadline: Jan 28th 11:59 PM IST .
  • Note: for students, who are not able to register, kindly wait till 7th when the window closes. I will not be able to answer your emails individually. I will first approve based on CGPA and other criteria (such as total class strength cap etc.), once the window closes.
  • Every registered student should join the Google mailing list ir-2022-spring@googlegroups.com. All urgent announcements would be made through the group. This group is meant only for registered (and approved) students. Kindly mention your roll number and the fact that you have registered in ERP.
  • IR 2011-22 Spring will be a fully online research-oriented course.
  • First class on January 10 (Monday), at 8:00 am (deferred from 4th due to ERP registration issues). Join the class Information-Retrieval-2022Spring on MS Teams (IITKGP domain; Code: g7tgjqc).
  • The course requires an understanding of the foundation of algorithms and data structures, probability and statistics, and knowledge of the basics of Natural Language Processing, and Machine Learning. This will be a research-oriented course that would require students to understand several CS research papers. There will be a term project that needs to be done using Python/Java. It is advisable to take this course only if you have the necessary background.

Tentative Weightage: Three online proctored tests (60%), Term Project (40 %).

Pre-requisites for the course

  • Data structures and algorithms
  • Probability and Statistics
  • Basics of Machine Learning
  • Basics of Natural Language Processing (Some might be covered during the course)
  • Basics of Graph algorithms (Some might be covered during the course)
  • Programming in Python/Java

Lecture Slides

  • Boolean retrieval - PDF
  • The term vocabulary & postings lists - PDF
  • Skip Pointers, Phrase Queries and Positional Indexing - PDF
  • Scoring, term weighting & the vector space model - PDF
  • Dictionaries and Tolerant Retrieval - PDF
  • Evaluation in information retrieval - PDF
  • Index Construction and Compression - PDF (Part 1) PDF (Part 2)
  • Relevance feedback & query expansion - PDF
  • Probabilistic information retrieval - PDF
  • Language models for information retrieval - PDF
  • Link analysis – HITS, PageRank
  • Word Vectors
  • Summarization
  • Learning to Rank
  • Neural IR

Text and Reference Literature

  1. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, Cambridge: Cambridge university press, 2008.
  2. Research Papers shared in class.

Every test should be attempted individually by each student. Plagiarism in any form will be severely penalized.

Avatar
Somak Aditya
Assistant Professor

My research interests include integrating knowledge and enabling higher-order reasoning in AI.