CS60092 Information Retrieval (Spring 2024)
Instructor Somak Aditya Timing (Tentative) MON (12:00-12:55) , TUE (10:00-11:55) First Class (Tentative) January 2nd, 2023 Venue NC242, Nalanda Classroom Teaching Assistants Sachin Vashishtha (sachinvashistha6916@gmail.com)
Announcements
- Note: for students, who are not able to register, kindly wait. I will not be able to answer your emails individually. From time to time, I will approve based on CGPA and other criteria (such as total class strength cap etc.).
- IR 2023-24 Spring will be a fully research-oriented course.
- First introductory class will be on January 2 (Tuesday), at 10:00 am.
- The course requires an understanding of the foundation of algorithms and data structures, probability and statistics, and knowledge of the basics of Natural Language Processing, and Machine Learning. This will be a research-oriented course that would require students to understand several CS research papers. There will be a term project that needs to be done using Python/Java. It is advisable to take this course only if you have the necessary background.
Topics Covered
Week | Date | Description & materials | Readings & other resources |
---|---|---|---|
Week 1 | Mon. 09/01 |
Introduction to the course
|
|
Tue 10⁄01 | Boolean Retrieval: Dictionary and postings lists, boolean querying
|
||
Week 2 | Mon 15/ 01 Tue 16⁄01 |
The term vocabulary & postings lists
|
|
Week 3 | Mon. 22 / 01 | Skip Pointers, Phrase Queries and Positional Indexing
|
|
Week 4 | Tues 23 / 01 & Mon 29 / 01 | Scoring, term weighting & the vector space model
|
|
Week 5 | Tue. 30/ 01 & Mon. 05 / 02 | Dictionaries and Tolerant Retieval
|
|
Tues. 06 / 02 | Evaluation
|
||
Week 6 | Mon. 12⁄02 | Index Constructions
|
|
Mon. 13⁄02 | Tutorial 1
|
||
Week 7 | Tue. 27/ 02 | Index Compression
|
|
Week 8 | Mon. 4 / 03 & Tues. 5 / 03 | Relevance Feedback & Query Expansion
|
|
Week 9 | Mon. 11 / 03 | Probabilistic IR
|
|
Mon. 12⁄03 & 18⁄03 | Language Models for IR
|
||
Week 10 | Mon. 19/ 03 | Link Analysis: HITS
|
|
Tues. 20 / 03 | Link Analysis: PageRank
|
||
Mon 1 / 04 | Word2Vec (Part I and II)
|
||
Tues 2 / 04 | Learning to Rank
|
|
Tentative Topics
- Boolean retrieval
- The term vocabulary & postings lists
- Skip Pointers, Phrase Queries and Positional Indexing
- Scoring, term weighting & the vector space model
- Dictionaries and Tolerant Retrieval
- Evaluation in information retrieval
- Index Construction and Compression
- Relevance feedback & query expansion
- Probabilistic information retrieval
- Language models for information retrieval (Probabilistic, RNN, Transformers)
- Retrieval-Augmented Generation (RAG) and Large Language Models
- Link analysis – HITS, PageRank
- Word Vectors
- Learning to Rank
Pre-requisites for the course
- Data structures and Algorithms
- Probability and Statistics
- Basics of Machine Learning
- Basics of Natural Language Processing (Some might be covered during the course)
- Basics of Graph algorithms (Some might be covered during the course)
- Programming in Python/Java
Reference Literature, Useful Tools and Software Resources
- Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, Cambridge: Cambridge university press, 2008.
- Research Papers shared in class.
- Pranav Rajpurkar’s Harvard CS197 AI Research Experiences
Every test should be attempted individually by each student. Plagiarism in any form will be severely penalized.