weka.classifiers.trees
Class REPTree

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.DistributionClassifier
          extended byweka.classifiers.trees.REPTree
All Implemented Interfaces:
AdditionalMeasureProducer, java.lang.Cloneable, Drawable, OptionHandler, java.io.Serializable, WeightedInstancesHandler

public class REPTree
extends DistributionClassifier
implements OptionHandler, WeightedInstancesHandler, Drawable, AdditionalMeasureProducer

Fast decision tree learner. Builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning. Only sorts values for numeric attributes once. Missing values are dealt with by splitting the corresponding instances into pieces (i.e. as in C4.5). Valid options are:

-M number
Set minimum number of instances per leaf (default 2).

-V number
Set minimum numeric class variance proportion of train variance for split (default 1e-3).

-N number
Number of folds for reduced error pruning (default 3).

-S number
Seed for random data shuffling (default 1).

-P
No pruning.

-D
Maximum tree depth (default -1, no maximum).

See Also:
Serialized Form

Nested Class Summary
protected  class REPTree.Tree
          An inner class for building and storing the tree structure
 
Field Summary
protected  int m_MaxDepth
          Upper bound on the tree depth
protected  double m_MinNum
          The minimum number of instances per leaf.
protected  double m_MinVarianceProp
          The minimum proportion of the total variance (over all the data) required for split.
protected  boolean m_NoPruning
          Don't prune
protected  int m_NumFolds
          Number of folds for reduced error pruning.
protected  int m_Seed
          Seed for random data shuffling.
protected  REPTree.Tree m_Tree
          The Tree object
 
Constructor Summary
REPTree()
           
 
Method Summary
 void buildClassifier(Instances data)
          Builds classifier.
 double[] distributionForInstance(Instance instance)
          Computes class distribution of an instance using the tree.
 java.util.Enumeration enumerateMeasures()
          Returns an enumeration of the additional measure names.
 int getMaxDepth()
          Get the value of MaxDepth.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure.
 double getMinNum()
          Get the value of MinNum.
 double getMinVarianceProp()
          Get the value of MinVarianceProp.
 boolean getNoPruning()
          Get the value of NoPruning.
 int getNumFolds()
          Get the value of NumFolds.
 java.lang.String[] getOptions()
          Gets options from this classifier.
 int getSeed()
          Get the value of Seed.
 java.lang.String graph()
          Outputs the decision tree as a graph
 java.util.Enumeration listOptions()
          Lists the command-line options for this classifier.
static void main(java.lang.String[] argv)
          Main method for this class.
 int numNodes()
          Computes size of the tree.
 void setMaxDepth(int newMaxDepth)
          Set the value of MaxDepth.
 void setMinNum(double newMinNum)
          Set the value of MinNum.
 void setMinVarianceProp(double newMinVarianceProp)
          Set the value of MinVarianceProp.
 void setNoPruning(boolean newNoPruning)
          Set the value of NoPruning.
 void setNumFolds(int newNumFolds)
          Set the value of NumFolds.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(int newSeed)
          Set the value of Seed.
 java.lang.String toString()
          Outputs the decision tree.
 
Methods inherited from class weka.classifiers.DistributionClassifier
calculateEntropy, calculateLabeledInstanceMargin, calculateMargin, classifyInstance
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Tree

protected REPTree.Tree m_Tree
The Tree object


m_NumFolds

protected int m_NumFolds
Number of folds for reduced error pruning.


m_Seed

protected int m_Seed
Seed for random data shuffling.


m_NoPruning

protected boolean m_NoPruning
Don't prune


m_MinNum

protected double m_MinNum
The minimum number of instances per leaf.


m_MinVarianceProp

protected double m_MinVarianceProp
The minimum proportion of the total variance (over all the data) required for split.


m_MaxDepth

protected int m_MaxDepth
Upper bound on the tree depth

Constructor Detail

REPTree

public REPTree()
Method Detail

getNoPruning

public boolean getNoPruning()
Get the value of NoPruning.

Returns:
Value of NoPruning.

setNoPruning

public void setNoPruning(boolean newNoPruning)
Set the value of NoPruning.

Parameters:
newNoPruning - Value to assign to NoPruning.

getMinNum

public double getMinNum()
Get the value of MinNum.

Returns:
Value of MinNum.

setMinNum

public void setMinNum(double newMinNum)
Set the value of MinNum.

Parameters:
newMinNum - Value to assign to MinNum.

getMinVarianceProp

public double getMinVarianceProp()
Get the value of MinVarianceProp.

Returns:
Value of MinVarianceProp.

setMinVarianceProp

public void setMinVarianceProp(double newMinVarianceProp)
Set the value of MinVarianceProp.

Parameters:
newMinVarianceProp - Value to assign to MinVarianceProp.

getSeed

public int getSeed()
Get the value of Seed.

Returns:
Value of Seed.

setSeed

public void setSeed(int newSeed)
Set the value of Seed.

Parameters:
newSeed - Value to assign to Seed.

getNumFolds

public int getNumFolds()
Get the value of NumFolds.

Returns:
Value of NumFolds.

setNumFolds

public void setNumFolds(int newNumFolds)
Set the value of NumFolds.

Parameters:
newNumFolds - Value to assign to NumFolds.

getMaxDepth

public int getMaxDepth()
Get the value of MaxDepth.

Returns:
Value of MaxDepth.

setMaxDepth

public void setMaxDepth(int newMaxDepth)
Set the value of MaxDepth.

Parameters:
newMaxDepth - Value to assign to MaxDepth.

listOptions

public java.util.Enumeration listOptions()
Lists the command-line options for this classifier.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all available options.

getOptions

public java.lang.String[] getOptions()
Gets options from this classifier.

Specified by:
getOptions in interface OptionHandler
Returns:
the list of current option settings as an array of strings

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

numNodes

public int numNodes()
Computes size of the tree.


enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Returns an enumeration of the additional measure names.

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Returns:
an enumeration of the measure names

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure.

Specified by:
getMeasure in interface AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure
Throws:
java.lang.IllegalArgumentException - if the named measure is not supported

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Builds classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been generated successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Computes class distribution of an instance using the tree.

Specified by:
distributionForInstance in class DistributionClassifier
Parameters:
instance - the instance to be classified
Returns:
an array containing the estimated membership probabilities of the test instance in each class (this should sum to at most 1)
Throws:
java.lang.Exception - if distribution could not be computed successfully

graph

public java.lang.String graph()
                       throws java.lang.Exception
Outputs the decision tree as a graph

Specified by:
graph in interface Drawable
Returns:
the graph described by a string
Throws:
java.lang.Exception - if the graph can't be computed

toString

public java.lang.String toString()
Outputs the decision tree.


main

public static void main(java.lang.String[] argv)
Main method for this class.