UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Text Mining with Information Extraction (2002)
Un Yong Nahm
and
Raymond J. Mooney
Text mining
concerns looking for patterns in unstructured text. The related task of
Information Extraction
(IE) is about locating specific items in natural-language documents. This paper presents a framework for text mining, called DiscoTEX (Discovery from Text EXtraction), using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships. The initial version of DiscoTEX integrates an IE module acquired by an IE learning system, and a standard rule induction module. However, this approach has problems when the same extracted entity or feature is represented by similar but not identical strings in different documents. Consequently, we also develop an alternate rule induction system called
TextRISE
, that allows for partial matching of textual items. Encouraging preliminary results are presented on applying these techniques to a corpus of Internet documents.
View:
PDF
,
PS
Citation:
In
Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases
, pp. 60-67, Stanford, CA, March 2002.
Bibtex:
@InProceedings{nahm:aaai-matkb02, title={Text Mining with Information Extraction}, author={Un Yong Nahm and Raymond J. Mooney}, booktitle={Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases}, month={March}, address={Stanford, CA}, pages={60-67}, url="http://www.cs.utexas.edu/users/ai-lab?nahm:aaai-matkb02", year={2002} }
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Un Yong Nahm
Ph.D. Alumni
pebronia [at] acm org
Areas of Interest
Machine Learning
Text Data Mining
Labs
Machine Learning