Duplicate Detection: Resources
- Workshops
- Software
- SecondString
- an open-source Java-based package of approximate string-matching
techniques.
- SimMetrics
- another open-source Java-based library of similarity metric techniques.
- MARLIN - a Weka-based Java package for
duplicate detection. Currently in the process of being cleaned up,
please contact mbilenko@cs.utexas.edu if you
would like to receive the current version.
- Febrl
(Freely Extensible Biomedical Record Linkage) does data standardisation
(segmentation and cleaning) and probabilistic record linkage ("fuzzy"
matching) of one or more files or data sources which do not share a unique
record key or identifier.
- The Link King - a
freely downloadable record linkage application for SAS.
- Projects, seminars, courses
Back to RIDDLE homepage
Last modified: August 25, 2003