CS60092 Information Retrieval (Spring 2023)

Instructor     Somak Aditya
Timing (Tentative)     MON (12:00-12:55) , TUE (10:00-11:55)
First Class     January 3rd, 2023
Venue     NC443, Nalanda Classroom
Teaching Assistants     Sachin Vashishtha (sachinvashistha6916@gmail.com)
Bishal Santra (bsantraigi@gmail.com)
Karde Vivek Manikrao (vivek10.karde@gmail.com)
Deepak Chaudhary (DEEPAKCHAUDHARY4311@kgpian.iitkgp.ac.in)

Announcements

  • Recording of GUEST LECTURE BY Dr. Anirudh Kembhavi, Director of Computer Vision at Allen Institute for AI (AI2) is available here.
  • INVITED GUEST LECTURE BY Dr. Anirudh Kembhavi, Director of Computer Vision at Allen Institute for AI (AI2) on MARCH 13, MONDAY confirmed!!
  • Note: for students, who are not able to register, kindly wait. I will not be able to answer your emails individually. From time to time, I will approve based on CGPA and other criteria (such as total class strength cap etc.).
  • Every registered student should join the Google mailing list ir-2023-spring@googlegroups.com. All urgent announcements would be made through the group. This group is meant only for registered (and approved) students. Kindly mention your roll number and the fact that you have registered in ERP.
  • IR 2022-23 Spring will be a fully research-oriented course.
  • First introductory class will be on January 3 (Tuesday), at 10:00 am.
  • The course requires an understanding of the foundation of algorithms and data structures, probability and statistics, and knowledge of the basics of Natural Language Processing, and Machine Learning. This will be a research-oriented course that would require students to understand several CS research papers. There will be a term project that needs to be done using Python/Java. It is advisable to take this course only if you have the necessary background.

Weightage: Class-Performance/Viva (8%), Mid-Sem and End-Sem (60%), Term Project (32%).

Topics Covered

Week Date Description & materials Readings & other resources
Week 1 Mon. 09/01 Introduction to the course

Tues. 10 / 01 Boolean Retrieval: Dictionary and postings lists, boolean querying

Week 2 Mon. 16/ 01 The term vocabulary & postings lists

Tues. 17 / 01 Skip Pointers, Phrase Queries and Positional Indexing

Week 3 Mon. 23/ 01 & Tues 24/ 01 Scoring, term weighting & the vector space model

Week 4 Mon. 30/ 01 Dictionaries and Tolerant Retieval

Tues. 31 / 01 Evaluation

Week 5 Mon. 602 & 702 Index Constructions

Week 6 Mon. 1302 Tutorial 1

    To be Updated
Week 7 Mon. 27/ 02 Index Compression

Tues. 28 / 02 Relevance Feedback & Query Expansion

Week 8 Mon. 7/ 03 Relevance Feedback & Query Expansion

See above.
Tues. 8 / 03 Probabilistic IR

Week 9 Mon. 1303 & 1403 Language Models for IR

Week 10 Mon. 20/ 03 Link Analysis: HITS

Tues. 21 / 03 Link Analysis: PageRank

Week 11 Mon. 27/ 03 Web Crawling

Tues. 28 / 03 Word2Vec (Part I and II)

Tentative Topics

  • Boolean retrieval
  • The term vocabulary & postings lists
  • Skip Pointers, Phrase Queries and Positional Indexing
  • Scoring, term weighting & the vector space model
  • Dictionaries and Tolerant Retrieval
  • Evaluation in information retrieval
  • Index Construction and Compression
  • Relevance feedback & query expansion
  • Probabilistic information retrieval
  • Language models for information retrieval (Probabilistic, RNN, Transformers)
  • Link analysis – HITS, PageRank
  • Word Vectors
  • Summarization
  • Learning to Rank

Pre-requisites for the course

  • Data structures and Algorithms
  • Probability and Statistics
  • Basics of Machine Learning
  • Basics of Natural Language Processing (Some might be covered during the course)
  • Basics of Graph algorithms (Some might be covered during the course)
  • Programming in Python/Java

Reference Literature, Useful Tools and Software Resources

  1. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, Cambridge: Cambridge university press, 2008.
  2. Research Papers shared in class.
  3. Pranav Rajpurkar’s Harvard CS197 AI Research Experiences

Every test should be attempted individually by each student. Plagiarism in any form will be severely penalized.

Avatar
Somak Aditya
Assistant Professor

My research interests include integrating knowledge and enabling higher-order reasoning in AI.