CS60092 Information Retrieval (Spring 2024)

Instructor     Somak Aditya
Timing (Tentative)     MON (12:00-12:55) , TUE (10:00-11:55)
First Class (Tentative)     January 2nd, 2023
Venue     NC242, Nalanda Classroom
Teaching Assistants     Sachin Vashishtha (sachinvashistha6916@gmail.com)

Announcements

  • Note: for students, who are not able to register, kindly wait. I will not be able to answer your emails individually. From time to time, I will approve based on CGPA and other criteria (such as total class strength cap etc.).
  • IR 2023-24 Spring will be a fully research-oriented course.
  • First introductory class will be on January 2 (Tuesday), at 10:00 am.
  • The course requires an understanding of the foundation of algorithms and data structures, probability and statistics, and knowledge of the basics of Natural Language Processing, and Machine Learning. This will be a research-oriented course that would require students to understand several CS research papers. There will be a term project that needs to be done using Python/Java. It is advisable to take this course only if you have the necessary background.

Weightage: Class-Performance/Quiz (8-10%), Mid-Sem and End-Sem (60%), Term Project (30-32%).

Topics Covered

Week Date Description & materials Readings & other resources
Week 1 Mon. 09/01
Introduction to the course

Tue 1001 Boolean Retrieval: Dictionary and postings lists, boolean querying

Week 2 Mon 15/ 01
Tue 1601
The term vocabulary & postings lists

Week 3 Mon. 22 / 01 Skip Pointers, Phrase Queries and Positional Indexing

Week 4 Tues 23 / 01 & Mon 29 / 01 Scoring, term weighting & the vector space model

Week 5 Tue. 30/ 01 & Mon. 05 / 02 Dictionaries and Tolerant Retieval

Tues. 06 / 02 Evaluation

Week 6 Mon. 1202 Index Constructions

Mon. 1302 Tutorial 1

    To be Updated
Week 7 Tue. 27/ 02 Index Compression

Week 8 Mon. 4 / 03 & Tues. 5 / 03 Relevance Feedback & Query Expansion

Week 9 Mon. 11 / 03 Probabilistic IR

Mon. 1203 & 1803 Language Models for IR

Week 10 Mon. 19/ 03 Link Analysis: HITS

Tues. 20 / 03 Link Analysis: PageRank

Mon 1 / 04 Word2Vec (Part I and II)

Tues 2 / 04 Learning to Rank

Tentative Topics

  • Boolean retrieval
  • The term vocabulary & postings lists
  • Skip Pointers, Phrase Queries and Positional Indexing
  • Scoring, term weighting & the vector space model
  • Dictionaries and Tolerant Retrieval
  • Evaluation in information retrieval
  • Index Construction and Compression
  • Relevance feedback & query expansion
  • Probabilistic information retrieval
  • Language models for information retrieval (Probabilistic, RNN, Transformers)
  • Retrieval-Augmented Generation (RAG) and Large Language Models
  • Link analysis – HITS, PageRank
  • Word Vectors
  • Learning to Rank

Pre-requisites for the course

  • Data structures and Algorithms
  • Probability and Statistics
  • Basics of Machine Learning
  • Basics of Natural Language Processing (Some might be covered during the course)
  • Basics of Graph algorithms (Some might be covered during the course)
  • Programming in Python/Java

Reference Literature, Useful Tools and Software Resources

  1. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, Cambridge: Cambridge university press, 2008.
  2. Research Papers shared in class.
  3. Pranav Rajpurkar’s Harvard CS197 AI Research Experiences

Every test should be attempted individually by each student. Plagiarism in any form will be severely penalized.

Avatar
Somak Aditya
Assistant Professor

My research interests include integrating knowledge and enabling higher-order reasoning in AI.