About CitHit - A Citation Recognition Tool
As more full-text articles are deposited into PubMed Central, there is a need to mining knowledge from them. Citation is an important component of this knowledge as it connects different articles semantically. Automatically parsing citations will assist text mining of full-text biomedical articles.
We developed CitHit, an automatic citation recognition system, to recognize the citations in PMC articles in PDF format. The system extracts citations from articles, parses the full citations (discerning fields like author name, article title) and then maps them to the citation id in the full text. In addition, the citation will map to MEDLINE to get the article’s PMID, so that the relations of citations can be built as a graph. We use conditional random fields to parse the full citation, which provides over 95% accuracy.
For more information please contact qing[at]uwm[dot]edu