Comparative Experiments on Learning Information Extractors for Proteins and their Interactions

Comparative Experiments on Learning Information Extractors for Proteins and their Interactions (2005)

Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond J. Mooney, Arun Kumar Ramani, and Yuk Wah Wong

Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction efforts have been frustrated by the lack of conventions for describing human genes and proteins. We have developed and evaluated a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting information on interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions more accurately than manually-developed rules.

View:

PDF, PS

Citation:

Artificial Intelligence in Medicine (special issue on Summarization and Information Extraction from Medical Documents), 2 (2005), pp. 139-155.

Bibtex:

People

Razvan Bunescu	Ph.D. Alumni	bunescu [at] ohio edu
Ruifang Ge	Ph.D. Alumni	grf [at] cs utexas edu
Rohit Kate	Postdoctoral Alumni	katerj [at] uwm edu
Raymond J. Mooney	Faculty	mooney [at] cs utexas edu
Yuk Wah Wong	Ph.D. Alumni	ywwong [at] cs utexas edu

Areas of Interest

Bioinformatics Information Extraction Machine Learning

Labs

Machine Learning