A Mutually Beneficial Integration of Data Mining and Information Extraction (2000)
Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
View:
PDF, PS
Citation:
In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), pp. 627-632, Austin, TX, July 2000.
Bibtex:

Raymond J. Mooney Faculty mooney [at] cs utexas edu
Un Yong Nahm Ph.D. Alumni pebronia [at] acm org