UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
A Mutually Beneficial Integration of Data Mining and Information Extraction (2000)
Un Yong Nahm
and
Raymond J. Mooney
Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
View:
PDF
,
PS
Citation:
In
Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00)
, pp. 627-632, Austin, TX, July 2000.
Bibtex:
@InProceedings{nahm:aaai00, title={A Mutually Beneficial Integration of Data Mining and Information Extraction}, author={Un Yong Nahm and Raymond J. Mooney}, booktitle={Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00)}, month={July}, address={Austin, TX}, key={DiscoTEX, KDD, IE}, pages={627-632}, url="http://www.cs.utexas.edu/users/ai-lab?nahm:aaai00", year={2000} }
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Un Yong Nahm
Ph.D. Alumni
pebronia [at] acm org
Areas of Interest
Information Extraction
Machine Learning
Text Data Mining
Labs
Machine Learning