COMPSCI 444: Introduction to Text Retrieval and Its Applications in Biomedicine
COMPSCI 744: Text Retrieval and Its Applications in Biomedicine
LEC 001 2-3:15 PM MW 9/02 - 12/14 EMS E225
All course materials are on D2L
Description: The growing amount of scientific discovery and medical knowledge has led to a corresponding growth in the amount of online biomedical data and information. A growing challenge for biomedical researchers is how to access and manage this ever-increasing quantity of information. This situation presents opportunities and challenges for information retrieval (IR). This course addresses text retrieval and classification applications in biomedicine. Specifically the course will cover advanced levels of biomedical indexing, query processing, IR algorithms, and document retrieval methods involving supervised and unsupervised machine learning.
Textbook:
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008
Weblink (includes entire book, including a web-friendly version with embedded links):
http://nlp.stanford.edu/IR-book/information-retrieval-book.html
There have been several editions online, with some renumbering of sections. The sections below refer to the most recent (April 1, 2009) one.
There is also a long list of errata that appear in various printings of the book:
http://nlp.stanford.edu/IR-book/html/errata.html
Other reading materials are assigned for each class.
Course Objectives:
At the conclusion of this course, students will:
Understand why IR is important in biomedicine.
Understand IR algorithms and techniques for indexing, query processing, and document classification and their applications in biomedicine.
Be able to use IR algorithms and systems to solve problems in biomedicine.
Grading:
Undergrads: 30% Homework, 35% Midterm exam, 35% Final exam (or, if preapproved, a project)
Grads: 30% Homework, 20% Midterm 10% Online quizzes and group activities. 40% Project: The project may be done individually or in a team of 2-3 people. The final project will include a software system, a 2-3 page written project report, and an oral presentation. The report should describe the problem, the approach, and evaluation and should cite related work where appropriate.
Tentative Timeline:
|
Date |
Topic |
|
Resource Materials |
|
Week 1 Sep 8 |
Course Overview |
L1 |
Textbook: Chapter 1 Lectures: Course Overview Paper: Jensen et al 2006, Literature mining for the biologist: from information retrieval to biological discovery |
|
Week 2 Sep 13-15 |
Introduction
Indexing |
L2 L3_4 |
Textbook: Chapter 2 (see also 6.1) Lectures: Introduction, Indexing |
|
Week 3 Sep 20-22 |
Indexing (cont)
Query processing |
L3_4 L5_7 |
Textbook: Chapter 2 (cont) Chapter 3
Lectures: Indexing (cont), Query processing |
|
Week 4 27-29 |
Query processing (cont) |
L5_7 |
Textbook: Chapter 3 (cont) Lectures: Query processing (cont) |
|
Week 5 Oct 4-6 |
Usability and Query Expansion |
L8_9 |
Textbook: Sections 8.6, 8.7 Chapter 9 Lectures Usability and Query Expansion |
|
Week 6 Oct 11-13
|
Ranking, Scoring and term weighting
Vector space models
|
L10 L11 |
Textbook Chapter 6 (6.1- 6.3), Chapter 7
|
|
Week 7 Oct 18 - 20 |
Evaluation
MIDTERM EXAM |
L12_13 |
Textbook: Chapter 8
Papers on Evaluation : Yu H and Kaufman 2007. A cognitive evaluation of four online search engines for answering definitional questions posed by physicians. Pacific Symposium on Biocomputing.
Paper: Hersh W, Cohen AM, Roberts P, Rekapalli HK. TREC 2006 Genomics Track Overview |
|
Week 8 Oct 25-27 |
Evaluation (cont)
Introduction to projects and an overview of Weka |
L12_13 |
Textbook: Chapter 8 (cont) Lectures: Evaluation (part 2 ), TREC-Genomics, Project Intro, & Weka |
|
Week 9 Nov 1-3 |
Text classification |
|
Textbook: Chapters 13-15 Lectures: Text classification (part 1 ) |
|
Week 10 Nov 8-10 |
Text classification (cont) |
|
Lectures: Text classification (part 2-3) |
|
Week 11 Nov 15-17
|
Text classification (cont)
XML retrieval |
|
Textbook: Chapter 10 |
|
Week 12 Nov 22 No class: Nov 24 |
XML retrieval (cont) |
|
Lectures: XML retrieval (part 2) |
|
Week 13 Nov 29-Dec 1 |
Web searching |
|
Textbook: Chapters 19-21 Lectures: Web searching (parts 1-3) |
|
Week 14 Dec 6-8 |
Web searching (cont)
Project presentations and discussion |
|
Lectures: Web searching (part 4) |
|
Week 15 Dec 13
|
Project Presentations & Wrap Up (last day of class) |
|
|
|
Undergraduate Final Exam |
As published
|
|
|
Other Course Information:
Students with Disabilities: Students who have special needs requiring special accommodations please contact the instructor. The Student Accessibility Center (Mitchell 112, 414-229-5822) is also an excellent resource and staff is available to discuss concerns.
Religious Observances: Students will be allowed to complete examinations or other requirements that are missed because of a religious observance if prearranged with the instructor.
Academic Misconduct: The University has a responsibility to promote academic honesty and integrity and to develop procedures to deal effectively with instances of academic dishonesty. Students are responsible for the honest completion and representation of their work, for the appropriate citation of sources, and for respect of other’s academic endeavors. University policy prohibits and punishes misconduct, including any act by which a student seeks to claim credit for the work or efforts of another without authorization or citation (plagiarism), forges or falsifies documents, falsely represents his or her academic performance (cheating), or assists other students in any of these acts. Students who violate academic standards as set forth in UWS Chapter 14 and UWM Faculty Document 1686 will be confronted and must accept the consequences for their actions. Students who engage in academic misconduct are subject to a range of sanctions including but not limited to: a failing grade on an assignment or test, a failing grade in the course, and expulsion from the university.
Sexual Harassment: Sexual harassment is reprehensible and will not be tolerated by the University. It subverts the mission of the University and threatens the careers, educational experience, and well being of the students, faculty, and staff. The University will not tolerate behavior between or among members of the University community, which creates an unacceptable environment.
Incomplete: A notation of incomplete may be given in lieu of a final grade to a student who has carried a subject successfully until the end of a semester but who, because of illness or other unusual and substantiated cause beyond the student’s control, has been unable to take or complete the final examination and/or to complete some limited amount of term work. An incomplete is not given unless you prove to the instructor that you were prevented from completing course requirements for just cause.