weka.experiment
Class SemiSupPointActiveCurveCVResultProducer

java.lang.Object
  extended byweka.experiment.SemiSupPointActiveCurveCVResultProducer
All Implemented Interfaces:
AdditionalMeasureProducer, OptionHandler, ResultProducer, java.io.Serializable

public class SemiSupPointActiveCurveCVResultProducer
extends java.lang.Object
implements ResultProducer, OptionHandler, AdditionalMeasureProducer

N-fold cross-validation learning curve for point-wise active learning in semi-supervised learners (clusterers and classifiers)

See Also:
Serialized Form

Field Summary
static java.lang.String DATASET_FIELD_NAME
           
static java.lang.String FOLD_FIELD_NAME
           
static java.lang.String FRACTION_FIELD_NAME
           
protected  java.lang.String[] m_AdditionalMeasures
          The names of any additional measures to look for in SplitEvaluators
protected  int m_CurrentSize
          The current dataset size during stepping
protected  boolean m_debugOutput
          Save raw output of split evaluators --- for debugging purposes
protected  boolean m_DoActive
          Whether active learning is to be performed
protected  Instances m_Instances
          The dataset of interest
protected  boolean m_IsFraction
           
protected  boolean m_IsTransductive
          Whether transductive evaluation is to be performed
protected  int m_LowerSize
          The minimum number of instances to use.
protected  int m_NumFolds
          The number of folds in the cross-validation
protected  java.io.File m_OutputFile
          The destination output file/directory for raw output
protected  double[] m_PlotPoints
          The specific points to plot, either integers representing specific numbers of training examples, or decimal fractions representing percentages of the full training set
protected  ResultListener m_ResultListener
          The ResultListener to send results to
protected  SplitEvaluator m_SplitEvaluator
          The SplitEvaluator used to generate results
protected  int m_StepSize
          The number of instances to add at each step
protected  int m_UpperSize
          The maximum number of instances to use.
protected  OutputZipper m_ZipDest
          The output zipper to use for saving raw splitEvaluator output
static java.lang.String RUN_FIELD_NAME
           
static java.lang.String STEP_FIELD_NAME
           
static java.lang.String TIMESTAMP_FIELD_NAME
           
 
Constructor Summary
SemiSupPointActiveCurveCVResultProducer()
           
 
Method Summary
 java.lang.String DoActiveTipText()
          Returns the tip text for this property
 void doRun(int run)
          Gets the results for a specified run number.
 void doRunKeys(int run)
          Gets the keys for a specified run number.
 java.util.Enumeration enumerateMeasures()
          Returns an enumeration of any additional measure names that might be in the SplitEvaluator
 java.lang.String getCompatibilityState()
          Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface).
 boolean getDoActive()
          Get the value of m_DoActive.
 boolean getIsTransductive()
          Get the value of IsTransductive.
 java.lang.String[] getKeyNames()
          Gets the names of each of the columns produced for a single run.
 java.lang.Object[] getKeyTypes()
          Gets the data types of each of the columns produced for a single run.
 int getLowerSize()
          Get the value of LowerSize.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure
 int getNumFolds()
          Get the value of NumFolds.
 java.lang.String[] getOptions()
          Gets the current settings of the result producer.
 java.io.File getOutputFile()
          Get the value of OutputFile.
 java.lang.String getPlotPoints()
          Get the value of PlotPoints.
 boolean getRawOutput()
          Get if raw split evaluator output is to be saved
 java.lang.String[] getResultNames()
          Gets the names of each of the columns produced for a single run.
 java.lang.Object[] getResultTypes()
          Gets the data types of each of the columns produced for a single run.
 SplitEvaluator getSplitEvaluator()
          Get the SplitEvaluator.
 int getStepSize()
          Get the value of StepSize.
static java.lang.Double getTimestamp()
          Gets a Double representing the current date and time.
 int getUpperSize()
          Get the value of UpperSize.
 java.lang.String globalInfo()
          Returns a string describing this result producer
protected static boolean isInteger(double val)
          Return true if the given double represents an integer value
 java.lang.String isTransductiveTipText()
          Returns the tip text for this property
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options..
 java.lang.String lowerSizeTipText()
          Returns the tip text for this property
static void main(java.lang.String[] args)
           
protected  int maxTrainSize()
           
 java.lang.String numFoldsTipText()
          Returns the tip text for this property
 java.lang.String outputFileTipText()
          Returns the tip text for this property
protected  double[] parsePlotPoints(java.lang.String plotPoints)
          Parse a string of doubles separated by commas or spaces into a sorted array of doubles
protected  int plotPoint(int i)
          Return the number of training examples for the ith point on the curve for plotPoints as specified.
 java.lang.String plotPointsTipText()
          Returns the tip text for this property
 void postProcess()
          Perform any postprocessing.
 void preProcess()
          Prepare to generate results.
 java.lang.String rawOutputTipText()
          Returns the tip text for this property
 Instances reorganizeTrainForActiveLearning(Instances train, int currentSize, int[] instancesForActiveLearning)
           
 void setAdditionalMeasures(java.lang.String[] additionalMeasures)
          Set a list of method names for additional measures to look for in SplitEvaluators.
 void setDoActive(boolean flag)
          Set the value of m_DoActive.
 void setInstances(Instances instances)
          Sets the dataset that results will be obtained for.
protected  boolean setIsFraction()
          Determines if the points specified are fractions of the total number of examples
 void setIsTransductive(boolean flag)
          Set the value of IsTransductive.
 void setLowerSize(int newLowerSize)
          Set the value of LowerSize.
 void setNumFolds(int newNumFolds)
          Set the value of NumFolds.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setOutputFile(java.io.File newOutputFile)
          Set the value of OutputFile.
 void setPlotPoints(java.lang.String plotPoints)
          Set the value of PlotPoints.
 void setRawOutput(boolean d)
          Set to true if raw split evaluator output is to be saved
 void setResultListener(ResultListener listener)
          Sets the object to send results of each run to.
 void setSplitEvaluator(SplitEvaluator newSplitEvaluator)
          Set the SplitEvaluator.
 void setStepSize(int newStepSize)
          Set the value of StepSize.
 void setUpperSize(int newUpperSize)
          Set the value of UpperSize.
 java.lang.String splitEvaluatorTipText()
          Returns the tip text for this property
 java.lang.String stepSizeTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Gets a text descrption of the result producer.
 java.lang.String upperSizeTipText()
          Returns the tip text for this property
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Instances

protected Instances m_Instances
The dataset of interest


m_ResultListener

protected ResultListener m_ResultListener
The ResultListener to send results to


m_NumFolds

protected int m_NumFolds
The number of folds in the cross-validation


m_IsTransductive

protected boolean m_IsTransductive
Whether transductive evaluation is to be performed


m_DoActive

protected boolean m_DoActive
Whether active learning is to be performed


m_debugOutput

protected boolean m_debugOutput
Save raw output of split evaluators --- for debugging purposes


m_ZipDest

protected OutputZipper m_ZipDest
The output zipper to use for saving raw splitEvaluator output


m_OutputFile

protected java.io.File m_OutputFile
The destination output file/directory for raw output


m_SplitEvaluator

protected SplitEvaluator m_SplitEvaluator
The SplitEvaluator used to generate results


m_AdditionalMeasures

protected java.lang.String[] m_AdditionalMeasures
The names of any additional measures to look for in SplitEvaluators


m_LowerSize

protected int m_LowerSize
The minimum number of instances to use. If this is zero, the first step will contain m_StepSize instances


m_UpperSize

protected int m_UpperSize
The maximum number of instances to use. -1 indicates no maximum (other than the total number of instances)


m_StepSize

protected int m_StepSize
The number of instances to add at each step


m_PlotPoints

protected double[] m_PlotPoints
The specific points to plot, either integers representing specific numbers of training examples, or decimal fractions representing percentages of the full training set


m_CurrentSize

protected int m_CurrentSize
The current dataset size during stepping


DATASET_FIELD_NAME

public static java.lang.String DATASET_FIELD_NAME

RUN_FIELD_NAME

public static java.lang.String RUN_FIELD_NAME

FOLD_FIELD_NAME

public static java.lang.String FOLD_FIELD_NAME

TIMESTAMP_FIELD_NAME

public static java.lang.String TIMESTAMP_FIELD_NAME

STEP_FIELD_NAME

public static java.lang.String STEP_FIELD_NAME

FRACTION_FIELD_NAME

public static java.lang.String FRACTION_FIELD_NAME

m_IsFraction

protected boolean m_IsFraction
Constructor Detail

SemiSupPointActiveCurveCVResultProducer

public SemiSupPointActiveCurveCVResultProducer()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this result producer

Returns:
a description of the result producer suitable for displaying in the explorer/experimenter gui

setInstances

public void setInstances(Instances instances)
Sets the dataset that results will be obtained for.

Specified by:
setInstances in interface ResultProducer
Parameters:
instances - a value of type 'Instances'.

setResultListener

public void setResultListener(ResultListener listener)
Sets the object to send results of each run to.

Specified by:
setResultListener in interface ResultProducer
Parameters:
listener - a value of type 'ResultListener'

setAdditionalMeasures

public void setAdditionalMeasures(java.lang.String[] additionalMeasures)
Set a list of method names for additional measures to look for in SplitEvaluators. This could contain many measures (of which only a subset may be produceable by the current SplitEvaluator) if an experiment is the type that iterates over a set of properties.

Specified by:
setAdditionalMeasures in interface ResultProducer
Parameters:
additionalMeasures - an array of measure names, null if none

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Returns an enumeration of any additional measure names that might be in the SplitEvaluator

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Returns:
an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure

Specified by:
getMeasure in interface AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure
Throws:
java.lang.IllegalArgumentException - if the named measure is not supported

getTimestamp

public static java.lang.Double getTimestamp()
Gets a Double representing the current date and time. eg: 1:46pm on 20/5/1999 -> 19990520.1346

Returns:
a value of type Double

preProcess

public void preProcess()
                throws java.lang.Exception
Prepare to generate results.

Specified by:
preProcess in interface ResultProducer
Throws:
java.lang.Exception - if an error occurs during preprocessing.

postProcess

public void postProcess()
                 throws java.lang.Exception
Perform any postprocessing. When this method is called, it indicates that no more requests to generate results for the current experiment will be sent.

Specified by:
postProcess in interface ResultProducer
Throws:
java.lang.Exception - if an error occurs

doRunKeys

public void doRunKeys(int run)
               throws java.lang.Exception
Gets the keys for a specified run number. Different run numbers correspond to different randomizations of the data. Keys produced should be sent to the current ResultListener

Specified by:
doRunKeys in interface ResultProducer
Parameters:
run - the run number to get keys for.
Throws:
java.lang.Exception - if a problem occurs while getting the keys

maxTrainSize

protected int maxTrainSize()

doRun

public void doRun(int run)
           throws java.lang.Exception
Gets the results for a specified run number. Different run numbers correspond to different randomizations of the data. Results produced should be sent to the current ResultListener

Specified by:
doRun in interface ResultProducer
Parameters:
run - the run number to get results for.
Throws:
java.lang.Exception - if a problem occurs while getting the results

reorganizeTrainForActiveLearning

public Instances reorganizeTrainForActiveLearning(Instances train,
                                                  int currentSize,
                                                  int[] instancesForActiveLearning)

setIsFraction

protected boolean setIsFraction()
Determines if the points specified are fractions of the total number of examples


plotPoint

protected int plotPoint(int i)
Return the number of training examples for the ith point on the curve for plotPoints as specified.


isInteger

protected static boolean isInteger(double val)
Return true if the given double represents an integer value


getKeyNames

public java.lang.String[] getKeyNames()
Gets the names of each of the columns produced for a single run. This method should really be static.

Specified by:
getKeyNames in interface ResultProducer
Returns:
an array containing the name of each column

getKeyTypes

public java.lang.Object[] getKeyTypes()
Gets the data types of each of the columns produced for a single run. This method should really be static.

Specified by:
getKeyTypes in interface ResultProducer
Returns:
an array containing objects of the type of each column. The objects should be Strings, or Doubles.

getResultNames

public java.lang.String[] getResultNames()
Gets the names of each of the columns produced for a single run. This method should really be static.

Specified by:
getResultNames in interface ResultProducer
Returns:
an array containing the name of each column

getResultTypes

public java.lang.Object[] getResultTypes()
Gets the data types of each of the columns produced for a single run. This method should really be static.

Specified by:
getResultTypes in interface ResultProducer
Returns:
an array containing objects of the type of each column. The objects should be Strings, or Doubles.

getCompatibilityState

public java.lang.String getCompatibilityState()
Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface). For example, a cross-validation ResultProducer may have a setting for the number of folds. For a given state, the results produced should be compatible. Typically if a ResultProducer is an OptionHandler, this string will represent the command line arguments required to set the ResultProducer to that state.

Specified by:
getCompatibilityState in interface ResultProducer
Returns:
the description of the ResultProducer state, or null if no state is defined

outputFileTipText

public java.lang.String outputFileTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getOutputFile

public java.io.File getOutputFile()
Get the value of OutputFile.

Returns:
Value of OutputFile.

setOutputFile

public void setOutputFile(java.io.File newOutputFile)
Set the value of OutputFile.

Parameters:
newOutputFile - Value to assign to OutputFile.

numFoldsTipText

public java.lang.String numFoldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumFolds

public int getNumFolds()
Get the value of NumFolds.

Returns:
Value of NumFolds.

setNumFolds

public void setNumFolds(int newNumFolds)
Set the value of NumFolds.

Parameters:
newNumFolds - Value to assign to NumFolds.

isTransductiveTipText

public java.lang.String isTransductiveTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getIsTransductive

public boolean getIsTransductive()
Get the value of IsTransductive.

Returns:
Value of IsTransductive.

setIsTransductive

public void setIsTransductive(boolean flag)
Set the value of IsTransductive.

Parameters:
flag - Value to assign to IsTransductive.

DoActiveTipText

public java.lang.String DoActiveTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getDoActive

public boolean getDoActive()
Get the value of m_DoActive.

Returns:
Value of m_DoActive.

setDoActive

public void setDoActive(boolean flag)
Set the value of m_DoActive.

Parameters:
flag - Value to assign to m_DoActive.

lowerSizeTipText

public java.lang.String lowerSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getLowerSize

public int getLowerSize()
Get the value of LowerSize.

Returns:
Value of LowerSize.

setLowerSize

public void setLowerSize(int newLowerSize)
Set the value of LowerSize.

Parameters:
newLowerSize - Value to assign to LowerSize.

upperSizeTipText

public java.lang.String upperSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getUpperSize

public int getUpperSize()
Get the value of UpperSize.

Returns:
Value of UpperSize.

setUpperSize

public void setUpperSize(int newUpperSize)
Set the value of UpperSize.

Parameters:
newUpperSize - Value to assign to UpperSize.

stepSizeTipText

public java.lang.String stepSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getStepSize

public int getStepSize()
Get the value of StepSize.

Returns:
Value of StepSize.

setStepSize

public void setStepSize(int newStepSize)
Set the value of StepSize.

Parameters:
newStepSize - Value to assign to StepSize.

plotPointsTipText

public java.lang.String plotPointsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getPlotPoints

public java.lang.String getPlotPoints()
Get the value of PlotPoints.

Returns:
Value of PlotPoints.

setPlotPoints

public void setPlotPoints(java.lang.String plotPoints)
Set the value of PlotPoints.

Parameters:
plotPoints - Value to assign to PlotPoints.

parsePlotPoints

protected double[] parsePlotPoints(java.lang.String plotPoints)
Parse a string of doubles separated by commas or spaces into a sorted array of doubles


rawOutputTipText

public java.lang.String rawOutputTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getRawOutput

public boolean getRawOutput()
Get if raw split evaluator output is to be saved

Returns:
true if raw split evalutor output is to be saved

setRawOutput

public void setRawOutput(boolean d)
Set to true if raw split evaluator output is to be saved

Parameters:
d - true if output is to be saved

splitEvaluatorTipText

public java.lang.String splitEvaluatorTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSplitEvaluator

public SplitEvaluator getSplitEvaluator()
Get the SplitEvaluator.

Returns:
the SplitEvaluator.

setSplitEvaluator

public void setSplitEvaluator(SplitEvaluator newSplitEvaluator)
Set the SplitEvaluator.

Parameters:
newSplitEvaluator - new SplitEvaluator to use.

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options..

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-X num_folds
The number of folds to use for the cross-validation.

-D
Specify that raw split evaluator output is to be saved.

-O file/directory name
Specify the file or directory to which raw split evaluator output is to be saved. If a directory is specified, then each output string is saved as an individual gzip file. If a file is specified, then each output string is saved as an entry in a zip file.

-W classname
Specify the full class name of the split evaluator.

All option after -- will be passed to the split evaluator.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the result producer.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

toString

public java.lang.String toString()
Gets a text descrption of the result producer.

Returns:
a text description of the result producer.

main

public static void main(java.lang.String[] args)