weka.deduping
Class Deduper

java.lang.Object
  extended byweka.deduping.Deduper
All Implemented Interfaces:
java.lang.Cloneable
Direct Known Subclasses:
BasicDeduper

public abstract class Deduper
extends java.lang.Object
implements java.lang.Cloneable

An abstract class that takes a set of objects and identifies disjoint subsets of duplicates


Field Summary
protected  java.util.ArrayList m_statistics
          An arraylist of Object arrays containing statistics
 
Constructor Summary
Deduper()
           
 
Method Summary
abstract  void buildDeduper(Instances trainInstances, Instances testInstances)
          Given training data, build the metrics required by the deduper
abstract  void findDuplicates(Instances testInstances, int numObjects)
          Identify duplicates within the testing data
static Deduper forName(java.lang.String deduperName, java.lang.String[] options)
           
 java.util.ArrayList getStatistics()
          Return the list of statistics collected during deduping
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_statistics

protected java.util.ArrayList m_statistics
An arraylist of Object arrays containing statistics

Constructor Detail

Deduper

public Deduper()
Method Detail

buildDeduper

public abstract void buildDeduper(Instances trainInstances,
                                  Instances testInstances)
                           throws java.lang.Exception
Given training data, build the metrics required by the deduper

Throws:
java.lang.Exception

findDuplicates

public abstract void findDuplicates(Instances testInstances,
                                    int numObjects)
                             throws java.lang.Exception
Identify duplicates within the testing data

Parameters:
testInstances - a set of instances among which to identify duplicates
numObjects - the number of "true object" sets to create
Throws:
java.lang.Exception

forName

public static Deduper forName(java.lang.String deduperName,
                              java.lang.String[] options)
                       throws java.lang.Exception
Throws:
java.lang.Exception

getStatistics

public java.util.ArrayList getStatistics()
Return the list of statistics collected during deduping