weka.experiment
Class DeduperSplitEvaluator

java.lang.Object
  extended byweka.experiment.DeduperSplitEvaluator
All Implemented Interfaces:
OptionHandler, java.io.Serializable, SplitEvaluator

public class DeduperSplitEvaluator
extends java.lang.Object
implements SplitEvaluator, OptionHandler

A SplitEvaluator that produces results for a deduper scheme on a nominal class attribute. -W classname
Specify the full class name of the deduper to evaluate.

See Also:
Serialized Form

Field Summary
protected  Deduper m_deduper
          The deduper used for evaluation
protected  java.lang.String m_deduperOptions
          The deduper options (if any)
protected  java.lang.String m_deduperVersion
          The deduper version
protected  java.lang.String m_result
          Holds the statistics for the most recent application of the deduper
 
Constructor Summary
DeduperSplitEvaluator()
          No args constructor.
 
Method Summary
 java.lang.String deduperTipText()
          Returns the tip text for this property
 Deduper getDeduper()
          Get the value of Deduper.
 java.lang.Object[] getKey()
          Gets the key describing the current SplitEvaluator.
 java.lang.String[] getKeyNames()
          Gets the names of each of the key columns produced for a single run.
 java.lang.Object[] getKeyTypes()
          Gets the data types of each of the key columns produced for a single run.
 java.lang.String[] getOptions()
          Gets the current settings of the Deduper.
 java.lang.String getRawResultOutput()
          Gets the raw output from the deduper
 java.lang.Object[] getResult(Instances trainData, Instances testData)
          Gets the results for the supplied train and test datasets.
 java.lang.String[] getResultNames()
          Gets the names of each of the result columns produced for a single run.
 java.lang.Object[] getResultTypes()
          Gets the data types of each of the result columns produced for a single run.
 java.lang.String globalInfo()
          Returns a string describing this split evaluator
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options..
 void setAdditionalMeasures(java.lang.String[] additionalMeasures)
          Does nothing, since deduping evaluation does not allow additional measures
 void setDeduper(Deduper newDeduper)
          Sets the deduper.
 void setDeduperName(java.lang.String newDeduperName)
          Set the Deduper to use, given it's class name.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
          Returns a text description of the split evaluator.
protected  void updateOptions()
          Updates the options that the current deduper is using.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_deduper

protected Deduper m_deduper
The deduper used for evaluation


m_result

protected java.lang.String m_result
Holds the statistics for the most recent application of the deduper


m_deduperOptions

protected java.lang.String m_deduperOptions
The deduper options (if any)


m_deduperVersion

protected java.lang.String m_deduperVersion
The deduper version

Constructor Detail

DeduperSplitEvaluator

public DeduperSplitEvaluator()
No args constructor.

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this split evaluator

Returns:
a description of the split evaluator suitable for displaying in the explorer/experimenter gui

setAdditionalMeasures

public void setAdditionalMeasures(java.lang.String[] additionalMeasures)
Does nothing, since deduping evaluation does not allow additional measures

Specified by:
setAdditionalMeasures in interface SplitEvaluator
Parameters:
additionalMeasures - a list of method names

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options..

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-W classname
Specify the full class name of the deduper to evaluate.

All option after -- will be passed to the deduper.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Deduper.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getKeyTypes

public java.lang.Object[] getKeyTypes()
Gets the data types of each of the key columns produced for a single run. The number of key fields must be constant for a given SplitEvaluator.

Specified by:
getKeyTypes in interface SplitEvaluator
Returns:
an array containing objects of the type of each key column. The objects should be Strings, or Doubles.

getKeyNames

public java.lang.String[] getKeyNames()
Gets the names of each of the key columns produced for a single run. The number of key fields must be constant for a given SplitEvaluator.

Specified by:
getKeyNames in interface SplitEvaluator
Returns:
an array containing the name of each key column

getKey

public java.lang.Object[] getKey()
Gets the key describing the current SplitEvaluator. For example This may contain the name of the deduper used for deduper predictive evaluation. The number of key fields must be constant for a given SplitEvaluator.

Specified by:
getKey in interface SplitEvaluator
Returns:
an array of objects containing the key.

getResultTypes

public java.lang.Object[] getResultTypes()
Gets the data types of each of the result columns produced for a single run. The number of result fields must be constant for a given SplitEvaluator.

Specified by:
getResultTypes in interface SplitEvaluator
Returns:
an array containing objects of the type of each result column. The objects should be Strings, or Doubles.

getResultNames

public java.lang.String[] getResultNames()
Gets the names of each of the result columns produced for a single run. The number of result fields must be constant for a given SplitEvaluator.

Specified by:
getResultNames in interface SplitEvaluator
Returns:
an array containing the name of each result column

getResult

public java.lang.Object[] getResult(Instances trainData,
                                    Instances testData)
                             throws java.lang.Exception
Gets the results for the supplied train and test datasets.

Specified by:
getResult in interface SplitEvaluator
Parameters:
trainData - the training Instances.
testData - the testing Instances.
Returns:
the raw results stored in an array. The objects stored in the array are object arrays, containing actual P/R/FM values for each point
Throws:
java.lang.Exception - if a problem occurs while getting the results

deduperTipText

public java.lang.String deduperTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getDeduper

public Deduper getDeduper()
Get the value of Deduper.

Returns:
Value of Deduper.

setDeduper

public void setDeduper(Deduper newDeduper)
Sets the deduper.

Parameters:
newDeduper - the new deduper to use.

updateOptions

protected void updateOptions()
Updates the options that the current deduper is using.


setDeduperName

public void setDeduperName(java.lang.String newDeduperName)
                    throws java.lang.Exception
Set the Deduper to use, given it's class name. A new deduper will be instantiated.

Throws:
java.lang.Exception - if the class name is invalid.

getRawResultOutput

public java.lang.String getRawResultOutput()
Gets the raw output from the deduper

Specified by:
getRawResultOutput in interface SplitEvaluator
Returns:
the raw output from the deduper

toString

public java.lang.String toString()
Returns a text description of the split evaluator.

Returns:
a text description of the split evaluator.