weka.classifiers.meta
Class DEC

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.DistributionClassifier
          extended byweka.classifiers.EnsembleClassifier
              extended byweka.classifiers.meta.DEC
All Implemented Interfaces:
AdditionalMeasureProducer, java.lang.Cloneable, OptionHandler, java.io.Serializable

public class DEC
extends EnsembleClassifier
implements OptionHandler

Class for creating Diverse Ensembles of a Classifier Valid options are:

-W classname
Specify the full class name of a weak as the basis for DEC (required).

-I num
Set the number of DEC iterations (default 50).

-N num
Specify the desired size of the committee (default 15).

-S seed
Random number seed for generating random examples (default random).

-R num
Number of random instances to add at each iteration (default 20).

Options after -- are passed to the designated classifier.

See Also:
Serialized Form

Field Summary
protected  int labeling_method
          Method to use for labeling randomly generated instances.
protected  Classifier m_Classifier
          The model base classifier to use
protected  int m_DataCreationMethod
          Method to use for creation of artificial data
protected  int m_DesiredSize
          The number of iterations.
protected  int m_NumIterations
          The number of iterations.
protected  double m_RandomSize
          Number of random instances to add at each iteration.
protected  int m_Seed
          The seed for random number generation.
protected  double m_Threshold
          Confidence threshold above committee decisions are to be trusted.
protected  int m_UseWeights
          Use weights for committe votes - default equal wts
 
Fields inherited from class weka.classifiers.EnsembleClassifier
m_EnsembleWts, m_SumEnsembleWts, m_TrainEnsembleDiversity, m_TrainEnsembleError, m_TrainError
 
Constructor Summary
DEC()
           
 
Method Summary
protected  void addInstances(Instances div_data, Instances random_data)
           
 void buildClassifier(Instances data)
          DEC method.
protected  double computeAccuracy(Instances data)
          Computes classification accuracy on the given data.
protected  double computeEnsembleWt(Classifier classifier, Instances data)
          Compute ensemble weight.
protected  double computeError(Instances data)
          Computes the error in classification on the given data.
protected  void computeStats(Instances data)
          Find and store mean and std devs for numeric attributes.
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
 double[] distributionForInstanceUsingWeights(Instance instance)
          Calculates the class membership probabilities for the given test instance.
protected  Instances generateRandomData(int random_size, int num_attributes, Instances data)
           
 Classifier getClassifier()
          Get the classifier used as the classifier
 int getDataCreationMethod()
          Method to use for creating artificial data
 int getDesiredSize()
          Gets the desired size of the committee.
 double[] getEnsemblePredictions(Instance instance)
          Returns class predictions of each ensemble member
 double getEnsembleSize()
          Returns size of ensemble
 double[] getEnsembleWts()
          Returns vote weights of ensemble members.
 int getNumIterations()
          Gets the number of bagging iterations
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 double getRandomSize()
          Number of random instances to add at each iteration.
 int getSeed()
          Gets the seed for the random number generations
 double getThreshold()
          Get the value of threshold.
 int getUseWeights()
          Get flag for using weights for committee votes.
protected  int highProbLabel(double[] probs)
          Probabilisticly select class label - (high probability).
protected  Instances labelData(Instances random_data, double threshold)
          Labels the randomly generated data.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
protected  int lowProbLabel(double[] probs)
          Probabilisticly select class label - (low probability).
static void main(java.lang.String[] argv)
          Main method for testing this class.
protected  void removeInstances(Instances div_data, int random_size)
           
protected  double selectNominalValue(double[] cumm)
          Given cummaltive probabilities select a nominal value index
protected  double selectThreshold(double error)
          Set threshold for relabeling based on user specified threhsold or on error of current committee
 void setClassifier(Classifier newClassifier)
          Set the classifier for bagging.
 void setDataCreationMethod(int method)
          Sets method to use for creating artificial data
 void setDesiredSize(int new_desired_size)
          Sets the desired size of the committee.
 void setNumIterations(int numIterations)
          Sets the number of bagging iterations
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setRandomSize(double new_random_size)
          Sets number of random instances to add at each iteration.
 void setSeed(int seed)
          Set the seed for random number generation.
 void setThreshold(double v)
          Set the value of threshold.
 void setUseWeights(int newUseWeights)
          Set flag for using weights for committee votes.
 java.lang.String toString()
          Returns description of the bagged classifier.
 
Methods inherited from class weka.classifiers.EnsembleClassifier
computeEnsembleMeasures, enumerateMeasures, getMeasure, initMeasures, measureTrainEnsembleDiversity, measureTrainEnsembleError, measureTrainError, updateEnsembleStats
 
Methods inherited from class weka.classifiers.DistributionClassifier
calculateEntropy, calculateLabeledInstanceMargin, calculateMargin, classifyInstance
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_UseWeights

protected int m_UseWeights
Use weights for committe votes - default equal wts


m_Classifier

protected Classifier m_Classifier
The model base classifier to use


m_NumIterations

protected int m_NumIterations
The number of iterations.


m_DesiredSize

protected int m_DesiredSize
The number of iterations.


m_Seed

protected int m_Seed
The seed for random number generation.


m_RandomSize

protected double m_RandomSize
Number of random instances to add at each iteration.


m_Threshold

protected double m_Threshold
Confidence threshold above committee decisions are to be trusted.


m_DataCreationMethod

protected int m_DataCreationMethod
Method to use for creation of artificial data


labeling_method

protected int labeling_method
Method to use for labeling randomly generated instances.

Constructor Detail

DEC

public DEC()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-W classname
Specify the full class name of a weak classifier as the basis for bagging (required).

-I num
Set the number of bagging iterations (default 50).

-S seed
Random number seed for resampling (default 0).

-N num
Specify the desired size of the committee (default 15).

-R num
Number of random instances to add at each iteration (default 5).

Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

setUseWeights

public void setUseWeights(int newUseWeights)
Set flag for using weights for committee votes.

Parameters:
newUseWeights - flag for using weights for committee votes.

getUseWeights

public int getUseWeights()
Get flag for using weights for committee votes.

Returns:
flag for using weights for committee votes

setClassifier

public void setClassifier(Classifier newClassifier)
Set the classifier for bagging.

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Get the classifier used as the classifier

Returns:
the classifier used as the classifier

getDataCreationMethod

public int getDataCreationMethod()
Method to use for creating artificial data

Returns:
Method to use for creating artificial data

setDataCreationMethod

public void setDataCreationMethod(int method)
Sets method to use for creating artificial data

Parameters:
method - the method to use for creating artificial data

getRandomSize

public double getRandomSize()
Number of random instances to add at each iteration.

Returns:
Number of random instances to add at each iteration

setRandomSize

public void setRandomSize(double new_random_size)
Sets number of random instances to add at each iteration.

Parameters:
new_random_size - the number of random instances to add at each iteration.

getDesiredSize

public int getDesiredSize()
Gets the desired size of the committee.

Returns:
the bag size, as a percentage.

setDesiredSize

public void setDesiredSize(int new_desired_size)
Sets the desired size of the committee.


setNumIterations

public void setNumIterations(int numIterations)
Sets the number of bagging iterations


getNumIterations

public int getNumIterations()
Gets the number of bagging iterations

Returns:
the maximum number of bagging iterations

setSeed

public void setSeed(int seed)
Set the seed for random number generation.

Parameters:
seed - the seed

getSeed

public int getSeed()
Gets the seed for the random number generations

Returns:
the seed for the random number generation

getThreshold

public double getThreshold()
Get the value of threshold.

Returns:
value of threshold.

setThreshold

public void setThreshold(double v)
Set the value of threshold.

Parameters:
v - Value to assign to threshold.

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
DEC method.

Specified by:
buildClassifier in class Classifier
Parameters:
data - the training data to be used for generating the bagged classifier.
Throws:
java.lang.Exception - if the classifier could not be built successfully

computeStats

protected void computeStats(Instances data)
Find and store mean and std devs for numeric attributes.

Parameters:
data - training instances

generateRandomData

protected Instances generateRandomData(int random_size,
                                       int num_attributes,
                                       Instances data)

selectNominalValue

protected double selectNominalValue(double[] cumm)
Given cummaltive probabilities select a nominal value index


selectThreshold

protected double selectThreshold(double error)
Set threshold for relabeling based on user specified threhsold or on error of current committee

Parameters:
error - Error of current committee
Returns:
the selected threshold

labelData

protected Instances labelData(Instances random_data,
                              double threshold)
                       throws java.lang.Exception
Labels the randomly generated data.

Parameters:
random_data - the randomly generated instances
Returns:
labeled data
Throws:
java.lang.Exception - if instances cannot be labeled successfully

highProbLabel

protected int highProbLabel(double[] probs)
Probabilisticly select class label - (high probability).

Parameters:
probs - posterior probability of each class
Returns:
highly likely class label probabilistically selected

lowProbLabel

protected int lowProbLabel(double[] probs)
                    throws java.lang.Exception
Probabilisticly select class label - (low probability).

Parameters:
probs - posterior probability of each class
Returns:
low probability class label probabilistically selected
Throws:
java.lang.Exception - if instances cannot be labeled successfully

removeInstances

protected void removeInstances(Instances div_data,
                               int random_size)
Parameters:
div_data - given instances
random_size - number of instances to delete from the end of given instances

addInstances

protected void addInstances(Instances div_data,
                            Instances random_data)
Parameters:
div_data - given instances
random_data - set of instances to add to given instances

computeError

protected double computeError(Instances data)
                       throws java.lang.Exception
Computes the error in classification on the given data.

Parameters:
data - the instances to be classified
Returns:
classification error
Throws:
java.lang.Exception - if error can not be computed successfully

computeEnsembleWt

protected double computeEnsembleWt(Classifier classifier,
                                   Instances data)
                            throws java.lang.Exception
Compute ensemble weight.

Parameters:
classifier - current classifier
data - instances to compute accuracy on
Returns:
computed vote weight for given classifier
Throws:
java.lang.Exception - if weight cannot be computed successfully

computeAccuracy

protected double computeAccuracy(Instances data)
                          throws java.lang.Exception
Computes classification accuracy on the given data.

Parameters:
data - the instances to be classified
Returns:
classification accuracy
Throws:
java.lang.Exception - if error can not be computed successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Specified by:
distributionForInstance in class DistributionClassifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed successfully

distributionForInstanceUsingWeights

public double[] distributionForInstanceUsingWeights(Instance instance)
                                             throws java.lang.Exception
Calculates the class membership probabilities for the given test instance. Incorporates vote weights.

Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed successfully

getEnsemblePredictions

public double[] getEnsemblePredictions(Instance instance)
                                throws java.lang.Exception
Returns class predictions of each ensemble member

Specified by:
getEnsemblePredictions in class EnsembleClassifier
Throws:
java.lang.Exception

getEnsembleWts

public double[] getEnsembleWts()
Returns vote weights of ensemble members.

Specified by:
getEnsembleWts in class EnsembleClassifier
Returns:
vote weights of ensemble members

getEnsembleSize

public double getEnsembleSize()
Returns size of ensemble

Specified by:
getEnsembleSize in class EnsembleClassifier

toString

public java.lang.String toString()
Returns description of the bagged classifier.

Returns:
description of the bagged classifier as a string

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options