weka.experiment
Class NoiseCurveCrossValidationResultProducer

java.lang.Object
  extended byweka.experiment.NoiseCurveCrossValidationResultProducer
All Implemented Interfaces:
AdditionalMeasureProducer, OptionHandler, ResultProducer, java.io.Serializable

public class NoiseCurveCrossValidationResultProducer
extends java.lang.Object
implements ResultProducer, OptionHandler, AdditionalMeasureProducer

Does a N-fold cross-validation, but generates a Noise Curve by also varying the number amount of Noise. Always uses the same N-fold test set for testing.

See Also:
Serialized Form

Field Summary
static java.lang.String DATASET_FIELD_NAME
           
static java.lang.String FOLD_FIELD_NAME
           
protected  java.lang.String[] m_AdditionalMeasures
          The names of any additional measures to look for in SplitEvaluators
protected  java.util.Vector m_AttributeStats
          Store Statistics of Attributes
protected  boolean m_classNoise
          Add noise to Class Labels in Training Set
protected  boolean m_classNoiseTest
          Add noise to Class Labels in Testing Set
protected  int m_CurrentSize
          Dataset size for the runs, we take the full dataset
protected  boolean m_debugOutput
          Save raw output of split evaluators --- for debugging purposes
protected  boolean m_featureMiss
          Set features missing, do not include Class as a Feature in Training Set
protected  boolean m_featureMissTest
          Set features missing, do not include Class as a Feature in Testing Set
protected  boolean m_featureNoise
          Add noise to Features, do not include Class as a Feature in Training Set
protected  boolean m_featureNoiseTest
          Add noise to Features, do not include Class as a Feature in Testing Set
protected  Instances m_Instances
          The dataset of interest
protected  int m_NumFolds
          The number of folds in the cross-validation
protected  java.io.File m_OutputFile
          The destination output file/directory for raw output
protected  double[] m_PlotPoints
          The specific points to plot, either integers representing specific numbers of training examples, or decimal fractions representing percentages of the full training set -- ONLY INTEGERS SUPPORTED
protected  java.util.Random m_Random
          Random Number, used for randomization in each run
protected  ResultListener m_ResultListener
          The ResultListener to send results to
protected  SplitEvaluator m_SplitEvaluator
          The SplitEvaluator used to generate results
protected  OutputZipper m_ZipDest
          The output zipper to use for saving raw splitEvaluator output
static java.lang.String NOISE_FIELD_NAME
           
static java.lang.String RUN_FIELD_NAME
           
static java.lang.String STEP_FIELD_NAME
           
static java.lang.String TIMESTAMP_FIELD_NAME
           
 
Constructor Summary
NoiseCurveCrossValidationResultProducer()
           
 
Method Summary
 void addClassNoise(Instances train, Instances test, int noiseLevel)
           
 void addFeatureMiss(Instances train, Instances test, int noiseLevel)
           
 void addFeatureNoise(Instances train, Instances test, int noiseLevel)
           
 java.lang.String classNoiseTestTipText()
          Returns the tip text for this property
 java.lang.String classNoiseTipText()
          Returns the tip text for this property
 void doRun(int run)
          Gets the results for a specified run number.
 void doRunKeys(int run)
          Gets the keys for a specified run number.
 java.util.Enumeration enumerateMeasures()
          Returns an enumeration of any additional measure names that might be in the SplitEvaluator
 java.lang.String featureMissTestTipText()
          Returns the tip text for this property
 java.lang.String featureMissTipText()
          Returns the tip text for this property
 java.lang.String featureNoiseTestTipText()
          Returns the tip text for this property
 java.lang.String featureNoiseTipText()
          Returns the tip text for this property
 boolean getclassNoise()
          Get if Noise is to be added to Class
 boolean getclassNoiseTest()
          Get if Noise is to be added to Class in Testing Set
 java.lang.String getCompatibilityState()
          Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface).
 boolean getfeatureMiss()
          Get if Features are to be set Missing
 boolean getfeatureMissTest()
          Get if Features are to be set Missing in Testing Set
 boolean getfeatureNoise()
          Get if Noise to be added in Features
 boolean getfeatureNoiseTest()
          Get if Noise is to be added to Feature in Testing Set
 java.lang.String[] getKeyNames()
          Gets the names of each of the columns produced for a single run.
 java.lang.Object[] getKeyTypes()
          Gets the data types of each of the columns produced for a single run.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure
 int getNumFolds()
          Get the value of NumFolds.
 java.lang.String[] getOptions()
          Gets the current settings of the result producer.
 java.io.File getOutputFile()
          Get the value of OutputFile.
 java.lang.String getPlotPoints()
          Get the value of PlotPoints.
 boolean getRawOutput()
          Get if raw split evaluator output is to be saved
 java.lang.String[] getResultNames()
          Gets the names of each of the columns produced for a single run.
 java.lang.Object[] getResultTypes()
          Gets the data types of each of the columns produced for a single run.
 SplitEvaluator getSplitEvaluator()
          Get the SplitEvaluator.
static java.lang.Double getTimestamp()
          Gets a Double representing the current date and time.
 java.lang.String globalInfo()
          Returns a string describing this result producer
protected static boolean isInteger(double val)
          Return true if the given double represents an integer value
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options..
static void main(java.lang.String[] args)
           
protected  int maxTrainSize()
          Get the maximum size of the training set based on maximum training set size from the n-fold CV
 java.lang.String numFoldsTipText()
          Returns the tip text for this property
 java.lang.String outputFileTipText()
          Returns the tip text for this property
protected  double[] parsePlotPoints(java.lang.String plotPoints)
          Parse a string of doubles separated by commas or spaces into a sorted array of doubles
protected  int plotPoint(int i)
          Return the amount of noise for the ith point on the curve for plotPoints as specified.
 java.lang.String plotPointsTipText()
          Returns the tip text for this property
 void postProcess()
          Perform any postprocessing.
 void preProcess()
          Prepare to generate results.
 java.lang.String rawOutputTipText()
          Returns the tip text for this property
 void setAdditionalMeasures(java.lang.String[] additionalMeasures)
          Set a list of method names for additional measures to look for in SplitEvaluators.
 void setclassNoise(boolean d)
          Set to true if Noise is to be added to Class
 void setclassNoiseTest(boolean d)
          Set to true if Noise is to be added to Class in Testing
 void setfeatureMiss(boolean d)
          Set to true if Features are to be set Missing
 void setfeatureMissTest(boolean d)
          Set to true if Features are to be set Missing in Testing
 void setfeatureNoise(boolean d)
          Set to true if Noise is to be added to Features
 void setfeatureNoiseTest(boolean d)
          Set to true if Noise is to be added in Fetures in Testing
 void setInstances(Instances instances)
          Sets the dataset that results will be obtained for.
 void setNumFolds(int newNumFolds)
          Set the value of NumFolds.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setOutputFile(java.io.File newOutputFile)
          Set the value of OutputFile.
 void setPlotPoints(java.lang.String plotPoints)
          Set the value of PlotPoints.
 void setRawOutput(boolean d)
          Set to true if raw split evaluator output is to be saved
 void setResultListener(ResultListener listener)
          Sets the object to send results of each run to.
 void setSplitEvaluator(SplitEvaluator newSplitEvaluator)
          Set the SplitEvaluator.
 java.lang.String splitEvaluatorTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Gets a text descrption of the result producer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Instances

protected Instances m_Instances
The dataset of interest


m_ResultListener

protected ResultListener m_ResultListener
The ResultListener to send results to


m_NumFolds

protected int m_NumFolds
The number of folds in the cross-validation


m_debugOutput

protected boolean m_debugOutput
Save raw output of split evaluators --- for debugging purposes


m_classNoise

protected boolean m_classNoise
Add noise to Class Labels in Training Set


m_featureNoise

protected boolean m_featureNoise
Add noise to Features, do not include Class as a Feature in Training Set


m_featureMiss

protected boolean m_featureMiss
Set features missing, do not include Class as a Feature in Training Set


m_classNoiseTest

protected boolean m_classNoiseTest
Add noise to Class Labels in Testing Set


m_featureNoiseTest

protected boolean m_featureNoiseTest
Add noise to Features, do not include Class as a Feature in Testing Set


m_featureMissTest

protected boolean m_featureMissTest
Set features missing, do not include Class as a Feature in Testing Set


m_ZipDest

protected OutputZipper m_ZipDest
The output zipper to use for saving raw splitEvaluator output


m_OutputFile

protected java.io.File m_OutputFile
The destination output file/directory for raw output


m_SplitEvaluator

protected SplitEvaluator m_SplitEvaluator
The SplitEvaluator used to generate results


m_AdditionalMeasures

protected java.lang.String[] m_AdditionalMeasures
The names of any additional measures to look for in SplitEvaluators


m_AttributeStats

protected java.util.Vector m_AttributeStats
Store Statistics of Attributes


m_PlotPoints

protected double[] m_PlotPoints
The specific points to plot, either integers representing specific numbers of training examples, or decimal fractions representing percentages of the full training set -- ONLY INTEGERS SUPPORTED


m_CurrentSize

protected int m_CurrentSize
Dataset size for the runs, we take the full dataset


m_Random

protected java.util.Random m_Random
Random Number, used for randomization in each run


DATASET_FIELD_NAME

public static java.lang.String DATASET_FIELD_NAME

RUN_FIELD_NAME

public static java.lang.String RUN_FIELD_NAME

FOLD_FIELD_NAME

public static java.lang.String FOLD_FIELD_NAME

TIMESTAMP_FIELD_NAME

public static java.lang.String TIMESTAMP_FIELD_NAME

STEP_FIELD_NAME

public static java.lang.String STEP_FIELD_NAME

NOISE_FIELD_NAME

public static java.lang.String NOISE_FIELD_NAME
Constructor Detail

NoiseCurveCrossValidationResultProducer

public NoiseCurveCrossValidationResultProducer()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this result producer


setInstances

public void setInstances(Instances instances)
Sets the dataset that results will be obtained for.

Specified by:
setInstances in interface ResultProducer
Parameters:
instances - a value of type 'Instances'.

setResultListener

public void setResultListener(ResultListener listener)
Sets the object to send results of each run to.

Specified by:
setResultListener in interface ResultProducer
Parameters:
listener - a value of type 'ResultListener'

setAdditionalMeasures

public void setAdditionalMeasures(java.lang.String[] additionalMeasures)
Set a list of method names for additional measures to look for in SplitEvaluators. This could contain many measures (of which only a subset may be produceable by the current SplitEvaluator) if an experiment is the type that iterates over a set of properties.

Specified by:
setAdditionalMeasures in interface ResultProducer
Parameters:
additionalMeasures - a list of method names

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Returns an enumeration of any additional measure names that might be in the SplitEvaluator

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Returns:
an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure

Specified by:
getMeasure in interface AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure

getTimestamp

public static java.lang.Double getTimestamp()
Gets a Double representing the current date and time. eg: 1:46pm on 20/5/1999 -> 19990520.1346


preProcess

public void preProcess()
                throws java.lang.Exception
Prepare to generate results.

Specified by:
preProcess in interface ResultProducer
Throws:
java.lang.Exception - if an error occurs during preprocessing.

postProcess

public void postProcess()
                 throws java.lang.Exception
Perform any postprocessing. When this method is called, it indicates that no more requests to generate results for the current experiment will be sent.

Specified by:
postProcess in interface ResultProducer
Throws:
java.lang.Exception - if an error occurs

doRunKeys

public void doRunKeys(int run)
               throws java.lang.Exception
Gets the keys for a specified run number. Different run numbers correspond to different randomizations of the data. Keys produced should be sent to the current ResultListener

Specified by:
doRunKeys in interface ResultProducer
Parameters:
run - the run number to get keys for.
Throws:
java.lang.Exception - if a problem occurs while getting the keys

maxTrainSize

protected int maxTrainSize()
Get the maximum size of the training set based on maximum training set size from the n-fold CV


doRun

public void doRun(int run)
           throws java.lang.Exception
Gets the results for a specified run number. Different run numbers correspond to different randomizations of the data. Results produced should be sent to the current ResultListener

Specified by:
doRun in interface ResultProducer
Parameters:
run - the run number to generate results for.
Throws:
java.lang.Exception - if a problem occurs while getting the results

plotPoint

protected int plotPoint(int i)
Return the amount of noise for the ith point on the curve for plotPoints as specified. Percent of NOISE Returned Can Simplify this procedure to return m_PlotPoints[i] directly


isInteger

protected static boolean isInteger(double val)
Return true if the given double represents an integer value


getKeyNames

public java.lang.String[] getKeyNames()
Gets the names of each of the columns produced for a single run. This method should really be static.

Specified by:
getKeyNames in interface ResultProducer
Returns:
an array containing the name of each key column

getKeyTypes

public java.lang.Object[] getKeyTypes()
Gets the data types of each of the columns produced for a single run. This method should really be static.

Specified by:
getKeyTypes in interface ResultProducer
Returns:
an array containing objects of the type of each key column. The objects should be Strings, or Doubles.

getResultNames

public java.lang.String[] getResultNames()
Gets the names of each of the columns produced for a single run. This method should really be static.

Specified by:
getResultNames in interface ResultProducer
Returns:
an array containing the name of each result column

getResultTypes

public java.lang.Object[] getResultTypes()
Gets the data types of each of the columns produced for a single run. This method should really be static.

Specified by:
getResultTypes in interface ResultProducer
Returns:
an array containing objects of the type of each result column. The objects should be Strings, or Doubles.

getCompatibilityState

public java.lang.String getCompatibilityState()
Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface). For example, a cross-validation ResultProducer may have a setting for the number of folds. For a given state, the results produced should be compatible. Typically if a ResultProducer is an OptionHandler, this string will represent the command line arguments required to set the ResultProducer to that state.

Specified by:
getCompatibilityState in interface ResultProducer
Returns:
the description of the ResultProducer state, or null if no state is defined

outputFileTipText

public java.lang.String outputFileTipText()
Returns the tip text for this property


getOutputFile

public java.io.File getOutputFile()
Get the value of OutputFile.


setOutputFile

public void setOutputFile(java.io.File newOutputFile)
Set the value of OutputFile.


numFoldsTipText

public java.lang.String numFoldsTipText()
Returns the tip text for this property


getNumFolds

public int getNumFolds()
Get the value of NumFolds.


setNumFolds

public void setNumFolds(int newNumFolds)
Set the value of NumFolds.


plotPointsTipText

public java.lang.String plotPointsTipText()
Returns the tip text for this property


getPlotPoints

public java.lang.String getPlotPoints()
Get the value of PlotPoints.


setPlotPoints

public void setPlotPoints(java.lang.String plotPoints)
Set the value of PlotPoints.


parsePlotPoints

protected double[] parsePlotPoints(java.lang.String plotPoints)
Parse a string of doubles separated by commas or spaces into a sorted array of doubles


rawOutputTipText

public java.lang.String rawOutputTipText()
Returns the tip text for this property


getRawOutput

public boolean getRawOutput()
Get if raw split evaluator output is to be saved


setRawOutput

public void setRawOutput(boolean d)
Set to true if raw split evaluator output is to be saved


classNoiseTipText

public java.lang.String classNoiseTipText()
Returns the tip text for this property


getclassNoise

public boolean getclassNoise()
Get if Noise is to be added to Class


setclassNoise

public void setclassNoise(boolean d)
Set to true if Noise is to be added to Class


featureNoiseTipText

public java.lang.String featureNoiseTipText()
Returns the tip text for this property


getfeatureNoise

public boolean getfeatureNoise()
Get if Noise to be added in Features


setfeatureNoise

public void setfeatureNoise(boolean d)
Set to true if Noise is to be added to Features


featureMissTipText

public java.lang.String featureMissTipText()
Returns the tip text for this property


getfeatureMiss

public boolean getfeatureMiss()
Get if Features are to be set Missing


setfeatureMiss

public void setfeatureMiss(boolean d)
Set to true if Features are to be set Missing


classNoiseTestTipText

public java.lang.String classNoiseTestTipText()
Returns the tip text for this property


getclassNoiseTest

public boolean getclassNoiseTest()
Get if Noise is to be added to Class in Testing Set


setclassNoiseTest

public void setclassNoiseTest(boolean d)
Set to true if Noise is to be added to Class in Testing


featureNoiseTestTipText

public java.lang.String featureNoiseTestTipText()
Returns the tip text for this property


getfeatureNoiseTest

public boolean getfeatureNoiseTest()
Get if Noise is to be added to Feature in Testing Set


setfeatureNoiseTest

public void setfeatureNoiseTest(boolean d)
Set to true if Noise is to be added in Fetures in Testing


featureMissTestTipText

public java.lang.String featureMissTestTipText()
Returns the tip text for this property


getfeatureMissTest

public boolean getfeatureMissTest()
Get if Features are to be set Missing in Testing Set


setfeatureMissTest

public void setfeatureMissTest(boolean d)
Set to true if Features are to be set Missing in Testing


splitEvaluatorTipText

public java.lang.String splitEvaluatorTipText()
Returns the tip text for this property


getSplitEvaluator

public SplitEvaluator getSplitEvaluator()
Get the SplitEvaluator.


setSplitEvaluator

public void setSplitEvaluator(SplitEvaluator newSplitEvaluator)
Set the SplitEvaluator.


listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options..

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-X num_folds
The number of folds to use for the cross-validation.

-D
Specify that raw split evaluator output is to be saved.

-O file/directory name
Specify the file or directory to which raw split evaluator output is to be saved. If a directory is specified, then each output string is saved as an individual gzip file. If a file is specified, then each output string is saved as an entry in a zip file.

-W classname
Specify the full class name of the split evaluator.

-N Add Noise to Class in Training -n Add Noise to Class in Testing -F Add Noise to Features in Training -f Add Noise to Features in Testing -M Set Features Missing in Training -m Set Features Missing in Testing All option after -- will be passed to the split evaluator.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the result producer.

Specified by:
getOptions in interface OptionHandler
Returns:
the list of current option settings as an array of strings

toString

public java.lang.String toString()
Gets a text descrption of the result producer.


addClassNoise

public void addClassNoise(Instances train,
                          Instances test,
                          int noiseLevel)
                   throws java.lang.Exception
Throws:
java.lang.Exception

addFeatureNoise

public void addFeatureNoise(Instances train,
                            Instances test,
                            int noiseLevel)
                     throws java.lang.Exception
Throws:
java.lang.Exception

addFeatureMiss

public void addFeatureMiss(Instances train,
                           Instances test,
                           int noiseLevel)
                    throws java.lang.Exception
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)