weka.classifiers.meta
Class ActiveDecorate

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.DistributionClassifier
          extended byweka.classifiers.meta.ActiveDecorate
All Implemented Interfaces:
ActiveLearner, java.lang.Cloneable, OptionHandler, java.io.Serializable

public class ActiveDecorate
extends DistributionClassifier
implements OptionHandler, ActiveLearner

Active-DECORATE is a version of DECORATE that allows for selective sampling of training examples. DECORATE is a meta-learner for building diverse ensembles of classifiers by adding specially constructed artificial training examples. Comprehensive experiments have demonstrated that this technique is consistently more accurate than bagging and more accurate that boosting when training data is limited. For more details see

Prem Melville and Raymond J. Mooney. Constructing diverse classifier ensembles using artificial training examples. Proceedings of the Seventeeth International Joint Conference on Artificial Intelligence 2003.

Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a weak classifier as the basis for Decorate (default weka.classifiers.trees.j48.J48()).

-I num
Specify the desired size of the committee (default 15).

-M iterations
Set the maximum number of Decorate iterations (default 50).

-S seed
Seed for random number generator. (default 0).

-R factor
Factor that determines number of artificial examples to generate.

Options after -- are passed to the designated classifier.

See Also:
Serialized Form

Field Summary
protected  double m_ArtSize
          Amount of artificial/random instances to use - specified as a fraction of the training data size.
protected  java.util.Vector m_AttributeStats
          Attribute statistics - used for generating artificial examples.
protected  Classifier m_Classifier
          The model base classifier to use.
protected  java.util.Vector m_Committee
          Vector of classifiers that make up the committee/ensemble.
protected  boolean m_Debug
          Set to true to get debugging output.
protected  int m_DesiredSize
          The desired ensemble size.
protected  double m_Epsilon
          Smoothing parameter for 0-values in distributions
protected  int m_NumIterations
          The maximum number of Decorate iterations to run.
protected  java.util.Random m_Random
          The random number generator.
protected  int m_Seed
          The seed for random number generation.
protected  DistributionClassifier m_SelectionCommittee
           
protected  int m_SelectionScheme
          The selective sampling scheme to use.
 
Constructor Summary
ActiveDecorate()
           
 
Method Summary
protected  void addInstances(Instances data, Instances newData)
          Add new instances to the given set of instances.
 void buildClassifier(Instances data)
          Build Decorate classifier
protected  double calcEuclideanDis(Instance instance)
          Calculate the disagreement in the ensemble over the label of given examples.
protected  double calcJSDivergence(Instance instance)
          Calculate the disagreement in the ensemble over the label of given examples.
protected  double calcKLdivergence(double[] p1, double[] p2)
          Calculate the KL divergence between two probability distributions.
protected  double calcMajorityDis(Instance instance)
          Calculate the disagreement in the ensemble over the label of given examples.
protected  double calculateDisagreement(Instance instance)
          Calculate the disagreement in the ensemble over the label of given examples depending on the chosen selection scheme.
protected  double computeError(Instances data)
          Computes the error in classification on the given data.
protected  void computeStats(Instances data)
          Compute and store statistics required for generating artificial data.
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
protected  Instances generateArtificialData(int artSize, Instances data)
          Generate artificial training examples.
 double getArtificialSize()
          Factor that determines number of artificial examples to generate.
 Classifier getClassifier()
          Get the classifier used as the base classifier
 boolean getDebug()
          Get whether debugging is turned on
 int getDesiredSize()
          Gets the desired size of the committee.
 int getNumIterations()
          Gets the max number of Decorate iterations to run.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 int getSeed()
          Gets the seed for the random number generator.
 int getSelectionScheme()
          Get the value of m_SelectionScheme.
protected  int inverseLabel(double[] probs)
          Select class label such that the probability of selection is inversely proportional to the ensemble's predictions.
protected  void labelData(Instances artData)
          Labels the artificially generated data.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for testing this class.
protected  void removeInstances(Instances data, int numRemove)
          Removes a specified number of instances from the given set of instances.
protected  int selectIndexProbabilistically(double[] cdf)
          Given cumulative probabilities select a nominal attribute value index
 int[] selectInstances(Instances unlabeledActivePool, int num)
          Given a set of unlabeled examples, select a specified number of examples to be labeled.
 void setArtificialSize(double newArtSize)
          Sets factor that determines number of artificial examples to generate.
 void setClassifier(Classifier newClassifier)
          Set the base classifier for Decorate.
 void setDebug(boolean debug)
          Set debugging mode
 void setDesiredSize(int newDesiredSize)
          Sets the desired size of the committee.
 void setNumIterations(int numIterations)
          Sets the max number of Decorate iterations to run.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSeed(int seed)
          Set the seed for random number generator.
 void setSelectionScheme(int v)
          Set the value of m_SelectionScheme.
protected  void smoothDistribution(double[] probs)
           
 java.lang.String toString()
          Returns description of the Decorate classifier.
protected  void trainSelectionCommittee(Instances data)
           
 
Methods inherited from class weka.classifiers.DistributionClassifier
calculateEntropy, calculateLabeledInstanceMargin, calculateMargin, classifyInstance
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_SelectionCommittee

protected DistributionClassifier m_SelectionCommittee

m_Epsilon

protected double m_Epsilon
Smoothing parameter for 0-values in distributions


m_Debug

protected boolean m_Debug
Set to true to get debugging output.


m_Classifier

protected Classifier m_Classifier
The model base classifier to use.


m_Committee

protected java.util.Vector m_Committee
Vector of classifiers that make up the committee/ensemble.


m_DesiredSize

protected int m_DesiredSize
The desired ensemble size.


m_NumIterations

protected int m_NumIterations
The maximum number of Decorate iterations to run.


m_Seed

protected int m_Seed
The seed for random number generation.


m_ArtSize

protected double m_ArtSize
Amount of artificial/random instances to use - specified as a fraction of the training data size.


m_Random

protected java.util.Random m_Random
The random number generator.


m_AttributeStats

protected java.util.Vector m_AttributeStats
Attribute statistics - used for generating artificial examples.


m_SelectionScheme

protected int m_SelectionScheme
The selective sampling scheme to use.

Constructor Detail

ActiveDecorate

public ActiveDecorate()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-D
Turn on debugging output.

-W classname
Specify the full class name of a weak classifier as the basis for Decorate (required).

-I num
Specify the desired size of the committee (default 15).

-M iterations
Set the maximum number of Decorate iterations (default 50).

-S seed
Seed for random number generator. (default 0).

-R factor
Factor that determines number of artificial examples to generate.

Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

getSelectionScheme

public int getSelectionScheme()
Get the value of m_SelectionScheme.

Returns:
value of m_SelectionScheme.

setSelectionScheme

public void setSelectionScheme(int v)
Set the value of m_SelectionScheme.

Parameters:
v - Value to assign to m_SelectionScheme.

setDebug

public void setDebug(boolean debug)
Set debugging mode

Parameters:
debug - true if debug output should be printed

getDebug

public boolean getDebug()
Get whether debugging is turned on

Returns:
true if debugging output is on

setClassifier

public void setClassifier(Classifier newClassifier)
Set the base classifier for Decorate.

Parameters:
newClassifier - the Classifier to use.

getClassifier

public Classifier getClassifier()
Get the classifier used as the base classifier

Returns:
the classifier used as the classifier

getArtificialSize

public double getArtificialSize()
Factor that determines number of artificial examples to generate.

Returns:
factor that determines number of artificial examples to generate

setArtificialSize

public void setArtificialSize(double newArtSize)
Sets factor that determines number of artificial examples to generate.


getDesiredSize

public int getDesiredSize()
Gets the desired size of the committee.

Returns:
the desired size of the committee

setDesiredSize

public void setDesiredSize(int newDesiredSize)
Sets the desired size of the committee.

Parameters:
newDesiredSize - the desired size of the committee

setNumIterations

public void setNumIterations(int numIterations)
Sets the max number of Decorate iterations to run.

Parameters:
numIterations - max number of Decorate iterations to run

getNumIterations

public int getNumIterations()
Gets the max number of Decorate iterations to run.

Returns:
the max number of Decorate iterations to run

setSeed

public void setSeed(int seed)
Set the seed for random number generator.

Parameters:
seed - the random number seed

getSeed

public int getSeed()
Gets the seed for the random number generator.

Returns:
the seed for the random number generator

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Build Decorate classifier

Specified by:
buildClassifier in class Classifier
Parameters:
data - the training data to be used for generating the classifier
Throws:
java.lang.Exception - if the classifier could not be built successfully

trainSelectionCommittee

protected void trainSelectionCommittee(Instances data)
                                throws java.lang.Exception
Throws:
java.lang.Exception

computeStats

protected void computeStats(Instances data)
                     throws java.lang.Exception
Compute and store statistics required for generating artificial data.

Parameters:
data - training instances
Throws:
java.lang.Exception - if statistics could not be calculated successfully

generateArtificialData

protected Instances generateArtificialData(int artSize,
                                           Instances data)
Generate artificial training examples.

Parameters:
artSize - size of examples set to create
data - training data
Returns:
the set of unlabeled artificial examples

labelData

protected void labelData(Instances artData)
                  throws java.lang.Exception
Labels the artificially generated data.

Parameters:
artData - the artificially generated instances
Throws:
java.lang.Exception - if instances cannot be labeled successfully

inverseLabel

protected int inverseLabel(double[] probs)
                    throws java.lang.Exception
Select class label such that the probability of selection is inversely proportional to the ensemble's predictions.

Parameters:
probs - class membership probabilities of instance
Returns:
index of class label selected
Throws:
java.lang.Exception - if instances cannot be labeled successfully

selectIndexProbabilistically

protected int selectIndexProbabilistically(double[] cdf)
Given cumulative probabilities select a nominal attribute value index

Parameters:
cdf - array of cumulative probabilities
Returns:
index of attribute selected based on the probability distribution

removeInstances

protected void removeInstances(Instances data,
                               int numRemove)
Removes a specified number of instances from the given set of instances.

Parameters:
data - given instances
numRemove - number of instances to delete from the given instances

addInstances

protected void addInstances(Instances data,
                            Instances newData)
Add new instances to the given set of instances.

Parameters:
data - given instances
newData - set of instances to add to given instances

computeError

protected double computeError(Instances data)
                       throws java.lang.Exception
Computes the error in classification on the given data.

Parameters:
data - the instances to be classified
Returns:
classification error
Throws:
java.lang.Exception - if error can not be computed successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Specified by:
distributionForInstance in class DistributionClassifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed successfully

selectInstances

public int[] selectInstances(Instances unlabeledActivePool,
                             int num)
                      throws java.lang.Exception
Given a set of unlabeled examples, select a specified number of examples to be labeled.

Specified by:
selectInstances in interface ActiveLearner
Parameters:
unlabeledActivePool - pool of unlabeled examples
num - number of examples to selcted for labeling
Throws:
java.lang.Exception - if selective sampling fails

calculateDisagreement

protected double calculateDisagreement(Instance instance)
                                throws java.lang.Exception
Calculate the disagreement in the ensemble over the label of given examples depending on the chosen selection scheme.

Parameters:
instance - unlabeled instance from the current pool
Returns:
nomalized measure of disagreement
Throws:
java.lang.Exception - if disagreement could not be calculated properly

calcJSDivergence

protected double calcJSDivergence(Instance instance)
                           throws java.lang.Exception
Calculate the disagreement in the ensemble over the label of given examples. The disagreement is calculated between the posterior probabilities of each member classifier and those of the ensemble.

Parameters:
instance - unlabeled instance from the current pool
Returns:
nomalized measure of disagreement
Throws:
java.lang.Exception - if disagreement could not be calculated properly

smoothDistribution

protected void smoothDistribution(double[] probs)

calcKLdivergence

protected double calcKLdivergence(double[] p1,
                                  double[] p2)
Calculate the KL divergence between two probability distributions.

Parameters:
p1 - first probability disttribution
Returns:
the KL divergence between p1 and p2

calcEuclideanDis

protected double calcEuclideanDis(Instance instance)
                           throws java.lang.Exception
Calculate the disagreement in the ensemble over the label of given examples. The disagreement is calculated using the Jensen-Shannon divergence of the posterior probabilities

Parameters:
instance - unlabeled instance from the current pool
Returns:
nomalized measure of disagreement
Throws:
java.lang.Exception - if disagreement could not be calculated properly

calcMajorityDis

protected double calcMajorityDis(Instance instance)
                          throws java.lang.Exception
Calculate the disagreement in the ensemble over the label of given examples.

Parameters:
instance - unlabeled instance from the current pool
Returns:
nomalized measure of disagreement
Throws:
java.lang.Exception - if disagreement could not be calculated properly

toString

public java.lang.String toString()
Returns description of the Decorate classifier.

Returns:
description of the Decorate classifier as a string

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options