UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Mining Soft-Matching Association Rules (2002)
Un Yong Nahm
and
Raymond J. Mooney
Variation and noise in database entries can prevent data mining algorithms, such as association rule mining, from discovering important regularities. In particular, textual fields can exhibit variation due to typographical errors, mispellings, abbreviations, etc.. By allowing partial or "soft matching" of items based on a similarity metric such as edit-distance or cosine similarity, additional important patterns can be detected. This paper introduces an algorithm,
SoftApriori
that discovers soft-matching association rules given a user-supplied similarity metric for each field. Experimental results on several "noisy" datasets extracted from text demonstrate that
SoftApriori
discovers additional relationships that more accurately reflect regularities in the data.
View:
PDF
,
PS
Citation:
In
Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-2002)
, pp. 681-683, McLean, VA, November 2002.
Bibtex:
@InProceedings{nahm:cikm02, title={Mining Soft-Matching Association Rules}, author={Un Yong Nahm and Raymond J. Mooney}, booktitle={Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-2002)}, month={November}, address={McLean, VA}, key={DiscoTEX, KDD, IE, SoftApriori}, pages={681-683}, url="http://www.cs.utexas.edu/users/ai-lab?nahm:cikm02", year={2002} }
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Un Yong Nahm
Ph.D. Alumni
pebronia [at] acm org
Areas of Interest
Machine Learning
Text Data Mining
Labs
Machine Learning