CS60092 Information Retrieval (Spring 2023)
Instructor Somak Aditya Timing (Tentative) MON (12:00-12:55) , TUE (10:00-11:55) First Class January 3rd, 2023 Venue NC443, Nalanda Classroom Teaching Assistants Sachin Vashishtha (sachinvashistha6916@gmail.com)
Bishal Santra (bsantraigi@gmail.com)
Karde Vivek Manikrao (vivek10.karde@gmail.com)
Deepak Chaudhary (DEEPAKCHAUDHARY4311@kgpian.iitkgp.ac.in)
Announcements
- Recording of GUEST LECTURE BY Dr. Anirudh Kembhavi, Director of Computer Vision at Allen Institute for AI (AI2) is available here.
- INVITED GUEST LECTURE BY Dr. Anirudh Kembhavi, Director of Computer Vision at Allen Institute for AI (AI2) on MARCH 13, MONDAY confirmed!!
- Note: for students, who are not able to register, kindly wait. I will not be able to answer your emails individually. From time to time, I will approve based on CGPA and other criteria (such as total class strength cap etc.).
- Every registered student should join the Google mailing list ir-2023-spring@googlegroups.com. All urgent announcements would be made through the group. This group is meant only for registered (and approved) students. Kindly mention your roll number and the fact that you have registered in ERP.
- IR 2022-23 Spring will be a fully research-oriented course.
- First introductory class will be on January 3 (Tuesday), at 10:00 am.
- The course requires an understanding of the foundation of algorithms and data structures, probability and statistics, and knowledge of the basics of Natural Language Processing, and Machine Learning. This will be a research-oriented course that would require students to understand several CS research papers. There will be a term project that needs to be done using Python/Java. It is advisable to take this course only if you have the necessary background.
Topics Covered
Week | Date | Description & materials | Readings & other resources |
---|---|---|---|
Week 1 | Mon. 09/01 | Introduction to the course
|
|
Tues. 10 / 01 | Boolean Retrieval: Dictionary and postings lists, boolean querying
|
||
Week 2 | Mon. 16/ 01 | The term vocabulary & postings lists
|
|
Tues. 17 / 01 | Skip Pointers, Phrase Queries and Positional Indexing
|
||
Week 3 | Mon. 23/ 01 & Tues 24/ 01 | Scoring, term weighting & the vector space model
|
|
Week 4 | Mon. 30/ 01 | Dictionaries and Tolerant Retieval
|
|
Tues. 31 / 01 | Evaluation
|
||
Week 5 | Mon. 6⁄02 & 7⁄02 | Index Constructions
|
|
Week 6 | Mon. 13⁄02 | Tutorial 1
|
|
Week 7 | Mon. 27/ 02 | Index Compression
|
|
Tues. 28 / 02 | Relevance Feedback & Query Expansion
|
||
Week 8 | Mon. 7/ 03 | Relevance Feedback & Query Expansion
|
See above. |
Tues. 8 / 03 | Probabilistic IR
|
||
Week 9 | Mon. 13⁄03 & 14⁄03 | Language Models for IR
|
|
Week 10 | Mon. 20/ 03 | Link Analysis: HITS
|
|
Tues. 21 / 03 | Link Analysis: PageRank
|
||
Week 11 | Mon. 27/ 03 | Web Crawling
|
|
Tues. 28 / 03 | Word2Vec (Part I and II)
|
Tentative Topics
- Boolean retrieval
- The term vocabulary & postings lists
- Skip Pointers, Phrase Queries and Positional Indexing
- Scoring, term weighting & the vector space model
- Dictionaries and Tolerant Retrieval
- Evaluation in information retrieval
- Index Construction and Compression
- Relevance feedback & query expansion
- Probabilistic information retrieval
- Language models for information retrieval (Probabilistic, RNN, Transformers)
- Link analysis – HITS, PageRank
- Word Vectors
-
Summarization - Learning to Rank
Pre-requisites for the course
- Data structures and Algorithms
- Probability and Statistics
- Basics of Machine Learning
- Basics of Natural Language Processing (Some might be covered during the course)
- Basics of Graph algorithms (Some might be covered during the course)
- Programming in Python/Java
Reference Literature, Useful Tools and Software Resources
- Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, Cambridge: Cambridge university press, 2008.
- Research Papers shared in class.
- Pranav Rajpurkar’s Harvard CS197 AI Research Experiences
Every test should be attempted individually by each student. Plagiarism in any form will be severely penalized.