Information Retrieval
COSC-488
Department of Computer Science
Georgetown University
Instructor: Grace Hui Yang

Go back
Course Description: This course covers basic information retrieval theory and techniques, including basic probabilistic theory, text processing, information needs and query representation, document representation, retrieval models, relevance feedback, evaluaiton and other related topics. Retrieval models are the main focus of this course. We will look into both basic retrieval models such as vector space model, basic probabilistic model, Okapi, boolean, as well as more advanced retrieval models such as language models, topic models, and link-based models. Students are required to design and implement high-capacity text retrieval systems.
Prerequisites:
  • Programming and data-structures.
Time and Location:

Class: Tuesday and Thursday 3:30-4:45pm. Location: Reiss 284.
Office hours: Thursday 5:00-6:00pm. Location: St Mary's Hall 338. Or by appointment if needed.

Instructor: Grace Hui Yang
Textbooks:

    Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, Trevor Strohman. Addison Wesley, 2010.

Other Readings:

Selected papers or book chapters will be available online before lectures.

Grading: Homeworks 40%, Weekly reading summary (10%), Midterm exam 20%, Final exam 30%.
Policies:

Integrity policy: The course policy follows the department's default honor policy with the following modifications. Weekly reading summaries and homeworks (problem solutions, programming codes, and written reports) must be done individually. Collaboration on graded assignments is not allowed. If you wish to use code or software tools developed outside of the class by you or by someone else, you must obtain the instructor's permission before using these resources. Violation of these policies will be reported to the Honor Council or to the Graduate School. If you are unsure whether a practice is acceptable, please ask the instructor for clarification by e-mail or during class or office hours.

Homework submission and Late homework policy: Homeworks are due on Friday 11:59pm on the submission week. All homeworks should be submitted through Blackboard. A penalty of 50% will be applied for homework submitted within the next 24 hours after the due date; Zero credit will be assigned after that. All homeworks (even with zero credit) must be turned in order to pass the course.

Weekly reading summary submission: The weekly one-page reading summaries are due at the beginning of the first class in that week. The weekly reading summary should cover readings assigned for the entire week. Please bring it to the class. Do NOT send them through email. There is no late date for reading summaries.

Syllabus (Tentative)
Week Class Readings Slides Deadlines
1. Introduction Ch1; As we may think slides
2. Search Engine Architecture Ch2; Ch3. slides, slidesHW1 out.
3. Text Processing Ch4; Manning et al. Ch2 slidesHW1 due. HW2 out.
4. Document Indexing Ch5. slides
5. Boolean Retrieval Ch7.1.1; Manning et al. Ch1. (Optional) slidesHW2 due. HW3 out.
6. Vector Space Model Ch7.1.2; Manning et al. Ch6(Optional) slides
7. Evaluation Ch8; Yang and Liu, SIGIR99. slidesHW3 due. (extended to next week)
8. Evaluation; Relevance Feedback. Ch6. slidesHW3 Due. HW4 out.
9. Relevance Feedback and Query Expansion; Midterm Exam (Oct 25, 2012). Sample midterm exam questions
10. Probabilistics and Statistics Review; Probabilistic Retrieval Models Manning & Schutze Ch2; Ch7.2.1; Manning et al. CH11. (Optional) slidesHW4 due. HW5 out.
11. TREC week. No class.
12. Probablistic Retrieval Models Ch7.2.2; Robertson and Walker 94 (Optional). slides HW5 due. HW6 out.
13. Thanksgiving. Language Models for Retrieval CH7.3. slides
14. Language Models for Retrieval Zhai and Lafferty 04 slides
15. Latent Semantic Indexing slidesHW6 due.
16. Study week
17. Final exam Thursday Dec 20, 4pm. ICC 231.