weka.classifiers.meta
Class Crate

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.meta.Crate
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable

public class Crate
extends Classifier
implements OptionHandler

CRATE (Committee Regressor using Artificial Training Examples) is a meta-learner for building diverse ensembles of regressors by adding specially constructed artificial training examples. Comprehensive experiments have demonstrated that this technique is consistently more accurate than bagging and more accurate that boosting when training data is limited. For more details see

Prem Melville and Raymond J. Mooney. Constructing diverse classifier ensembles using artificial training examples. Proceedings of the Seventeeth International Joint Conference on Artificial Intelligence 2003.

Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a weak classifier as the basis for Crate (default weka.classifiers.trees.j48.J48()).

-I num
Specify the desired size of the committee (default 15).

-M iterations
Set the maximum number of Crate iterations (default 50).

-S seed
Seed for random number generator. (default 0).

-R factor
Factor that determines number of artificial examples to generate.

Options after -- are passed to the designated classifier.

See Also:
Serialized Form

Field Summary
protected  double m_Alpha
          Factor specifying desired amount of diversity
protected  double m_ArtSize
          Amount of artificial/random instances to use - specified as a fraction of the training data size.
protected  java.util.Vector m_AttributeStats
          Attribute statistics - used for generating artificial examples.
protected  Classifier m_Classifier
          The model base classifier to use.
protected  java.util.Vector m_Committee
          Vector of classifiers that make up the committee/ensemble.
protected  boolean m_Debug
          Set to true to get debugging output.
protected  int m_DesiredSize
          The desired ensemble size.
protected  int m_ErrorMeasure
          Error measure to optimize for
protected  Evaluation m_Evaluation
          Evaluator
protected  int m_NumIterations
          The maximum number of Crate iterations to run.
protected  java.util.Random m_Random
          The random number generator.
protected  int m_Seed
          The seed for random number generation.
 
Constructor Summary
Crate()
           
 
Method Summary
protected  void addInstances(Instances data, Instances newData)
          Add new instances to the given set of instances.
 void buildClassifier(Instances data)
          Build Crate classifier
 double classifyInstance(Instance instance)
          Classifies a given instance.
protected  double computeError(Instances data)
          Computes the error in prediction on the given data.
protected  void computeStats(Instances data)
          Compute and store statistics required for generating artificial data.
protected  Instances generateArtificialData(int artSize, Instances data)
          Generate artificial training examples.
 double getAlpha()
          Get the value of Alpha.
 double getArtificialSize()
          Factor that determines number of artificial examples to generate.
 Classifier getClassifier()
          Get the classifier used as the base classifier
 boolean getDebug()
          Get whether debugging is turned on
 int getDesiredSize()
          Gets the desired size of the committee.
 int getErrorMeasure()
          Get the value of errorMeasure.
 int getNumIterations()
          Gets the max number of Crate iterations to run.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 int getSeed()
          Gets the seed for the random number generator.
protected  void labelData(Instances artData)
          Labels the artificially generated data.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for testing this class.
protected  void removeInstances(Instances data, int numRemove)
          Removes a specified number of instances from the given set of instances.
protected  int selectIndexProbabilistically(double[] cdf)
          Given cumulative probabilities select a nominal attribute value index
 void setAlpha(double v)
          Set the value of Alpha.
 void setArtificialSize(double newArtSize)
          Sets factor that determines number of artificial examples to generate.
 void setClassifier(Classifier newClassifier)
          Set the base classifier for Crate.
 void setDebug(boolean debug)
          Set debugging mode
 void setDesiredSize(int newDesiredSize)
          Sets the desired size of the committee.
 void setErrorMeasure(int v)
          Set the value of errorMeasure.
 void setNumIterations(int numIterations)
          Sets the max number of Crate iterations to run.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(int seed)
          Set the seed for random number generator.
 java.lang.String toString()
          Returns description of the Crate classifier.
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Debug

protected boolean m_Debug
Set to true to get debugging output.


m_Classifier

protected Classifier m_Classifier
The model base classifier to use.


m_Committee

protected java.util.Vector m_Committee
Vector of classifiers that make up the committee/ensemble.


m_DesiredSize

protected int m_DesiredSize
The desired ensemble size.


m_NumIterations

protected int m_NumIterations
The maximum number of Crate iterations to run.


m_Seed

protected int m_Seed
The seed for random number generation.


m_ArtSize

protected double m_ArtSize
Amount of artificial/random instances to use - specified as a fraction of the training data size.


m_Random

protected java.util.Random m_Random
The random number generator.


m_AttributeStats

protected java.util.Vector m_AttributeStats
Attribute statistics - used for generating artificial examples.


m_Alpha

protected double m_Alpha
Factor specifying desired amount of diversity


m_Evaluation

protected Evaluation m_Evaluation
Evaluator


m_ErrorMeasure

protected int m_ErrorMeasure
Error measure to optimize for

Constructor Detail

Crate

public Crate()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a weak classifier as the basis for Crate (required).

-I num
Specify the desired size of the committee (default 15).

-M iterations
Set the maximum number of Crate iterations (default 50).

-S seed
Seed for random number generator. (default 0).

-R factor
Factor that determines number of artificial examples to generate.

Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getErrorMeasure

public int getErrorMeasure()
Get the value of errorMeasure.

Returns:
value of errorMeasure.

setErrorMeasure

public void setErrorMeasure(int v)
Set the value of errorMeasure.

Parameters:
v - Value to assign to errorMeasure.

getAlpha

public double getAlpha()
Get the value of Alpha.

Returns:
value of Alpha.

setAlpha

public void setAlpha(double v)
Set the value of Alpha.

Parameters:
v - Value to assign to Alpha.

setDebug

public void setDebug(boolean debug)
Set debugging mode

Parameters:
debug - true if debug output should be printed

getDebug

public boolean getDebug()
Get whether debugging is turned on

Returns:
true if debugging output is on

setClassifier

public void setClassifier(Classifier newClassifier)
Set the base classifier for Crate.

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Get the classifier used as the base classifier

Returns:
the classifier used as the classifier

getArtificialSize

public double getArtificialSize()
Factor that determines number of artificial examples to generate.

Returns:
factor that determines number of artificial examples to generate

setArtificialSize

public void setArtificialSize(double newArtSize)
Sets factor that determines number of artificial examples to generate.


getDesiredSize

public int getDesiredSize()
Gets the desired size of the committee.

Returns:
the desired size of the committee

setDesiredSize

public void setDesiredSize(int newDesiredSize)
Sets the desired size of the committee.

Parameters:
newDesiredSize - the desired size of the committee

setNumIterations

public void setNumIterations(int numIterations)
Sets the max number of Crate iterations to run.

Parameters:
numIterations - max number of Crate iterations to run

getNumIterations

public int getNumIterations()
Gets the max number of Crate iterations to run.

Returns:
the max number of Crate iterations to run

setSeed

public void setSeed(int seed)
Set the seed for random number generator.

Parameters:
seed - the random number seed

getSeed

public int getSeed()
Gets the seed for the random number generator.

Returns:
the seed for the random number generator

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Build Crate classifier

Specified by:
buildClassifier in class Classifier
Parameters:
data - the training data to be used for generating the classifier
Throws:
java.lang.Exception - if the classifier could not be built successfully

computeStats

protected void computeStats(Instances data)
                     throws java.lang.Exception
Compute and store statistics required for generating artificial data.

Parameters:
data - training instances
Throws:
java.lang.Exception - if statistics could not be calculated successfully

generateArtificialData

protected Instances generateArtificialData(int artSize,
                                           Instances data)
Generate artificial training examples.

Parameters:
artSize - size of examples set to create
data - training data
Returns:
the set of unlabeled artificial examples

selectIndexProbabilistically

protected int selectIndexProbabilistically(double[] cdf)
Given cumulative probabilities select a nominal attribute value index

Parameters:
cdf - array of cumulative probabilities
Returns:
index of attribute selected based on the probability distribution

labelData

protected void labelData(Instances artData)
                  throws java.lang.Exception
Labels the artificially generated data.

Parameters:
artData - the artificially generated instances
Throws:
java.lang.Exception - if instances cannot be labeled successfully

removeInstances

protected void removeInstances(Instances data,
                               int numRemove)
Removes a specified number of instances from the given set of instances.

Parameters:
data - given instances
numRemove - number of instances to delete from the given instances

addInstances

protected void addInstances(Instances data,
                            Instances newData)
Add new instances to the given set of instances.

Parameters:
data - given instances
newData - set of instances to add to given instances

computeError

protected double computeError(Instances data)
                       throws java.lang.Exception
Computes the error in prediction on the given data.

Parameters:
data - the instances to be classified
Returns:
mean absolute error
Throws:
java.lang.Exception - if error can not be computed successfully

classifyInstance

public double classifyInstance(Instance instance)
                        throws java.lang.Exception
Classifies a given instance.

Specified by:
classifyInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
the predicted value
Throws:
java.lang.Exception - if instance could not be predicted successfully

toString

public java.lang.String toString()
Returns description of the Crate classifier.

Returns:
description of the Crate classifier as a string

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options