COMPSCI 444: Introduction to Text Retrieval and Its Applications in Biomedicine

COMPSCI 744: Text Retrieval and Its Applications in Biomedicine

LEC 001 2-3:15 PM MW 9/02 - 12/14 EMS E225

All course materials are on D2L


Description: The growing amount of scientific discovery and medical knowledge has led to a corresponding growth in the amount of online biomedical data and information. A growing challenge for biomedical researchers is how to access and manage this ever-increasing quantity of information. This situation presents opportunities and challenges for information retrieval (IR). This course addresses text retrieval and classification applications in biomedicine. Specifically the course will cover advanced levels of biomedical indexing, query processing, IR algorithms, and document retrieval methods involving supervised and unsupervised machine learning.




Textbook:


Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008


Weblink (includes entire book, including a web-friendly version with embedded links):


http://nlp.stanford.edu/IR-book/information-retrieval-book.html


There have been several editions online, with some renumbering of sections. The sections below refer to the most recent (April 1, 2009) one.


There is also a long list of errata that appear in various printings of the book:

http://nlp.stanford.edu/IR-book/html/errata.html


Other reading materials are assigned for each class.


Course Objectives:

At the conclusion of this course, students will:


  1. Understand why IR is important in biomedicine.

  2. Understand IR algorithms and techniques for indexing, query processing, and document classification and their applications in biomedicine.

  3. Be able to use IR algorithms and systems to solve problems in biomedicine.


Grading:

Undergrads: 30% Homework, 35% Midterm exam, 35% Final exam (or, if preapproved, a project)


Grads: 30% Homework, 20% Midterm 10% Online quizzes and group activities. 40% Project: The project may be done individually or in a team of 2-3 people. The final project will include a software system, a 2-3 page written project report, and an oral presentation. The report should describe the problem, the approach, and evaluation and should cite related work where appropriate.




Tentative Timeline:

Date

Topic


Resource Materials

Week 1

Sep 8

Course Overview

L1

Textbook: Chapter 1

Lectures: Course Overview

Paper: Jensen et al 2006, Literature mining for the biologist: from information retrieval to biological discovery

Week 2

Sep 13-15

Introduction


Indexing

L2

L3_4

Textbook: Chapter 2 (see also 6.1)

Lectures: Introduction, Indexing

Week 3

Sep 20-22

Indexing (cont)


Query processing

L3_4

L5_7

Textbook: Chapter 2 (cont) Chapter 3


Lectures: Indexing (cont), Query processing

Week 4

27-29

Query processing (cont)

L5_7

Textbook: Chapter 3 (cont)

Lectures: Query processing (cont)

Week 5

Oct 4-6

Usability and Query Expansion

L8_9

Textbook: Sections 8.6, 8.7 Chapter 9

Lectures Usability and Query Expansion

Week 6

Oct 11-13


Ranking, Scoring and term weighting


Vector space models


L10

L11

Textbook Chapter 6 (6.1- 6.3), Chapter 7


Paper: Wilbur WJ. 2002. A thematic analysis of the AIDS literature. Pacific Symposium on Biocomputing 7:386-397.

Week 7

Oct 18 - 20

Evaluation


MIDTERM EXAM

L12_13

Textbook: Chapter 8


Papers on Evaluation : Yu H and Kaufman 2007. A cognitive evaluation of four online search engines for answering definitional questions posed by physicians. Pacific Symposium on Biocomputing.


Paper: Hersh W, Cohen AM, Roberts P, Rekapalli HK. TREC 2006 Genomics Track Overview


Week 8

Oct 25-27

Evaluation (cont)


Introduction to projects and an overview of Weka

L12_13

Textbook: Chapter 8 (cont)

Lectures: Evaluation (part 2 ), TREC-Genomics, Project Intro, & Weka

Week 9

Nov 1-3

Text classification


Textbook: Chapters 13-15

Lectures: Text classification (part 1 )

Week 10

Nov 8-10

Text classification (cont)


Lectures: Text classification (part 2-3)

Week 11

Nov 15-17


Text classification (cont)


XML retrieval


Textbook: Chapter 10

Week 12

Nov 22

No class: Nov 24

XML retrieval (cont)


Lectures: XML retrieval (part 2)

Week 13

Nov 29-Dec 1

Web searching


Textbook: Chapters 19-21

Lectures: Web searching (parts 1-3)

Week 14

Dec 6-8

Web searching (cont)


Project presentations and discussion


Lectures: Web searching (part 4)

Week 15

Dec 13


Project Presentations & Wrap Up (last day of class)



Undergraduate Final Exam

As published





Other Course Information: