weka.classifiers.sparse
Class IBkMetric

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.DistributionClassifier
          extended byweka.classifiers.sparse.IBkMetric
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable, UpdateableClassifier, WeightedInstancesHandler

public class IBkMetric
extends DistributionClassifier
implements OptionHandler, UpdateableClassifier, WeightedInstancesHandler

K-nearest neighbour classifier specialized for SparseInstance's. For more information, see

Aha, D., and D. Kibler (1991) "Instance-based learning algorithms", Machine Learning, vol.6, pp. 37-66.

Valid options are:

-K num
Set the number of nearest neighbors to use in prediction (default 1)

-W num
Set a fixed window size for incremental train/testing. As new training instances are added, oldest instances are removed to maintain the number of training instances at this size. (default no window)

-D
Neighbors will be weighted by the inverse of their distance when voting. (default equal weighting)

-F
Neighbors will be weighted by their similarity when voting. (default equal weighting)

-X
Selects the number of neighbors to use by hold-one-out cross validation, with an upper limit given by the -K option.

-S
When k is selected by cross-validation for numeric class attributes, minimize mean-squared error. (default mean absolute error)

-M metric-name
Specify the distance metric to be used; WeightedDotP by default.

See Also:
Serialized Form

Nested Class Summary
protected  class IBkMetric.NeighborList
           
protected  class IBkMetric.NeighborNode
           
 
Field Summary
protected  int m_ClassType
          The class attribute type
protected  boolean m_CrossValidate
          Whether to select k by cross validation
protected  int m_DistanceWeighting
          Whether the neighbours should be distance-weighted
protected  double m_EPSILON
          Small value to be used instead of 0 in converting from distances to similarities
protected  int m_kNN
          The number of neighbours to use for classification (currently)
protected  int m_kNNUpper
          The value of kNN provided by the user.
protected  boolean m_kNNValid
          Whether the value of k selected by cross validation has been invalidated by a change in the training instances
protected  double[] m_Max
          The maximum values for numeric attributes.
protected  boolean m_MeanSquared
          Whether to minimise mean squared error rather than mean absolute error when cross-validating on numeric prediction tasks
protected  Metric m_metric
          distance Metric
protected  java.lang.String m_MetricName
           
protected  double[] m_Min
          The minimum values for numeric attributes.
protected  double m_NumAttributesUsed
          The number of attributes the contribute to a prediction
protected  int m_NumClasses
          The number of class values (or 1 if predicting numeric)
protected  Instances m_Train
          The training instances used for classification.
protected  int m_WindowSize
          The maximum number of training instances allowed.
static Tag[] TAGS_WEIGHTING
           
static int WEIGHT_INVERSE
           
static int WEIGHT_NONE
           
static int WEIGHT_SIMILARITY
           
 
Constructor Summary
IBkMetric()
          IB1 classifer.
IBkMetric(int k)
          IBk classifier.
 
Method Summary
 void buildClassifier(Instances instances)
          Generates the classifier.
static java.lang.String concatStringArray(java.lang.String[] strings)
          A little helper to create a single String from an array of Strings
protected  void crossValidate()
          Select the best value for k by hold-one-out cross-validation.
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
protected  IBkMetric.NeighborList findNeighbors(Instance instance)
          Build the list of nearest k neighbors to the given test instance.
 double getAttributeMax(int index)
          Get an attributes maximum observed value
 double getAttributeMin(int index)
          Get an attributes minimum observed value
 boolean getCrossValidate()
          Gets whether hold-one-out cross-validation will be used to select the best k value
 boolean getDebug()
          Get the value of Debug.
 SelectedTag getDistanceWeighting()
          Gets the distance weighting method used.
 int getKNN()
          Gets the number of neighbours the learner will use.
 boolean getMeanSquared()
          Gets whether the mean squared error is used rather than mean absolute error when doing cross-validation.
 Metric getMetric()
          Get the distance metric
protected  java.lang.String getMetricSpec()
          Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier
 int getNumTraining()
          Get the number of training instances the classifier is currently using
 java.lang.String[] getOptions()
          Gets the current settings of IBkMetric.
 int getWindowSize()
          Gets the maximum number of instances allowed in the training pool.
protected  void init()
          Initialise scheme variables.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
protected  double[] makeDistribution(IBkMetric.NeighborList neighborlist)
          Turn the list of nearest neighbors into a probability distribution
 java.lang.String metricName()
          Get the name of the distance metric that is used Avoid the 'get' prefix so that this doesn't show in the dialogs
 void setCrossValidate(boolean newCrossValidate)
          Sets whether hold-one-out cross-validation will be used to select the best k value
 void setDebug(boolean newDebug)
          Set the value of Debug.
 void setDistanceWeighting(SelectedTag newMethod)
          Sets the distance weighting method used.
 void setKNN(int k)
          Set the number of neighbours the learner is to use.
 void setMeanSquared(boolean newMeanSquared)
          Sets whether the mean squared error is used rather than mean absolute error when doing cross-validation.
 void setMetric(Metric m)
          Set the distance metric
 void setMetricName(java.lang.String metricName)
          Set the distance metric
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setWindowSize(int newWindowSize)
          Sets the maximum number of instances allowed in the training pool.
 java.lang.String toString()
          Returns a description of this classifier.
 void updateClassifier(Instance instance)
          Adds the supplied instance to the training set
 
Methods inherited from class weka.classifiers.DistributionClassifier
calculateEntropy, calculateLabeledInstanceMargin, calculateMargin, classifyInstance
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Train

protected Instances m_Train
The training instances used for classification.


m_NumClasses

protected int m_NumClasses
The number of class values (or 1 if predicting numeric)


m_ClassType

protected int m_ClassType
The class attribute type


m_Min

protected double[] m_Min
The minimum values for numeric attributes.


m_Max

protected double[] m_Max
The maximum values for numeric attributes.


m_kNN

protected int m_kNN
The number of neighbours to use for classification (currently)


m_kNNUpper

protected int m_kNNUpper
The value of kNN provided by the user. This may differ from m_kNN if cross-validation is being used


m_kNNValid

protected boolean m_kNNValid
Whether the value of k selected by cross validation has been invalidated by a change in the training instances


m_WindowSize

protected int m_WindowSize
The maximum number of training instances allowed. When this limit is reached, old training instances are removed, so the training data is "windowed". Set to 0 for unlimited numbers of instances.


m_DistanceWeighting

protected int m_DistanceWeighting
Whether the neighbours should be distance-weighted


m_metric

protected Metric m_metric
distance Metric


m_MetricName

protected java.lang.String m_MetricName

m_CrossValidate

protected boolean m_CrossValidate
Whether to select k by cross validation


m_MeanSquared

protected boolean m_MeanSquared
Whether to minimise mean squared error rather than mean absolute error when cross-validating on numeric prediction tasks


m_EPSILON

protected double m_EPSILON
Small value to be used instead of 0 in converting from distances to similarities


WEIGHT_NONE

public static final int WEIGHT_NONE
See Also:
Constant Field Values

WEIGHT_INVERSE

public static final int WEIGHT_INVERSE
See Also:
Constant Field Values

WEIGHT_SIMILARITY

public static final int WEIGHT_SIMILARITY
See Also:
Constant Field Values

TAGS_WEIGHTING

public static final Tag[] TAGS_WEIGHTING

m_NumAttributesUsed

protected double m_NumAttributesUsed
The number of attributes the contribute to a prediction

Constructor Detail

IBkMetric

public IBkMetric(int k)
IBk classifier. Simple instance-based learner that uses the class of the nearest k training instances for the class of the test instances.

Parameters:
k - the number of nearest neighbors to use for prediction

IBkMetric

public IBkMetric()
IB1 classifer. Instance-based learner. Predicts the class of the single nearest training instance for each test instance.

Method Detail

getDebug

public boolean getDebug()
Get the value of Debug.

Returns:
Value of Debug.

setDebug

public void setDebug(boolean newDebug)
Set the value of Debug.

Parameters:
newDebug - Value to assign to Debug.

setKNN

public void setKNN(int k)
Set the number of neighbours the learner is to use.

Parameters:
k - the number of neighbours.

getKNN

public int getKNN()
Gets the number of neighbours the learner will use.

Returns:
the number of neighbours.

getWindowSize

public int getWindowSize()
Gets the maximum number of instances allowed in the training pool. The addition of new instances above this value will result in old instances being removed. A value of 0 signifies no limit to the number of training instances.

Returns:
Value of WindowSize.

setWindowSize

public void setWindowSize(int newWindowSize)
Sets the maximum number of instances allowed in the training pool. The addition of new instances above this value will result in old instances being removed. A value of 0 signifies no limit to the number of training instances.

Parameters:
newWindowSize - Value to assign to WindowSize.

getDistanceWeighting

public SelectedTag getDistanceWeighting()
Gets the distance weighting method used. Will be one of WEIGHT_NONE, WEIGHT_INVERSE, or WEIGHT_SIMILARITY

Returns:
the distance weighting method used.

setDistanceWeighting

public void setDistanceWeighting(SelectedTag newMethod)
Sets the distance weighting method used. Values other than WEIGHT_NONE, WEIGHT_INVERSE, or WEIGHT_SIMILARITY will be ignored.


getMeanSquared

public boolean getMeanSquared()
Gets whether the mean squared error is used rather than mean absolute error when doing cross-validation.

Returns:
true if so.

setMeanSquared

public void setMeanSquared(boolean newMeanSquared)
Sets whether the mean squared error is used rather than mean absolute error when doing cross-validation.

Parameters:
newMeanSquared - true if so.

getCrossValidate

public boolean getCrossValidate()
Gets whether hold-one-out cross-validation will be used to select the best k value

Returns:
true if cross-validation will be used.

setCrossValidate

public void setCrossValidate(boolean newCrossValidate)
Sets whether hold-one-out cross-validation will be used to select the best k value

Parameters:
newCrossValidate - true if cross-validation should be used.

getNumTraining

public int getNumTraining()
Get the number of training instances the classifier is currently using


getAttributeMin

public double getAttributeMin(int index)
                       throws java.lang.Exception
Get an attributes minimum observed value

Throws:
java.lang.Exception

getAttributeMax

public double getAttributeMax(int index)
                       throws java.lang.Exception
Get an attributes maximum observed value

Throws:
java.lang.Exception

buildClassifier

public void buildClassifier(Instances instances)
                     throws java.lang.Exception
Generates the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
instances - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been generated successfully

updateClassifier

public void updateClassifier(Instance instance)
                      throws java.lang.Exception
Adds the supplied instance to the training set

Specified by:
updateClassifier in interface UpdateableClassifier
Parameters:
instance - the instance to add
Throws:
java.lang.Exception - if instance could not be incorporated successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Specified by:
distributionForInstance in class DistributionClassifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if an error occurred during the prediction

setMetric

public void setMetric(Metric m)
Set the distance metric


getMetric

public Metric getMetric()
Get the distance metric


setMetricName

public void setMetricName(java.lang.String metricName)
Set the distance metric

Parameters:
metricName - the name of the distance metric that should be used

metricName

public java.lang.String metricName()
Get the name of the distance metric that is used Avoid the 'get' prefix so that this doesn't show in the dialogs


listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-K num
Set the number of nearest neighbors to use in prediction (default 1)

-W num
Set a fixed window size for incremental train/testing. As new training instances are added, oldest instances are removed to maintain the number of training instances at this size. (default no window)

-D
Neighbors will be weighted by the inverse of their distance when voting. (default equal weighting)

-F
Neighbors will be weighted by their similarity when voting. (default equal weighting)

-X
Select the number of neighbors to use by hold-one-out cross validation, with an upper limit given by the -K option.

-S
When k is selected by cross-validation for numeric class attributes, minimize mean-squared error. (default mean absolute error)

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getMetricSpec

protected java.lang.String getMetricSpec()
Gets the classifier specification string, which contains the class name of the classifier and any options to the classifier

Returns:
the classifier string.

getOptions

public java.lang.String[] getOptions()
Gets the current settings of IBkMetric.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

toString

public java.lang.String toString()
Returns a description of this classifier.

Returns:
a description of this classifier as a string.

init

protected void init()
Initialise scheme variables.


findNeighbors

protected IBkMetric.NeighborList findNeighbors(Instance instance)
                                        throws java.lang.Exception
Build the list of nearest k neighbors to the given test instance.

Parameters:
instance - the instance to search for neighbours of
Returns:
a list of neighbors
Throws:
java.lang.Exception

makeDistribution

protected double[] makeDistribution(IBkMetric.NeighborList neighborlist)
                             throws java.lang.Exception
Turn the list of nearest neighbors into a probability distribution

Parameters:
neighborlist - the list of nearest neighboring instances
Returns:
the probability distribution
Throws:
java.lang.Exception

crossValidate

protected void crossValidate()
Select the best value for k by hold-one-out cross-validation. If the class attribute is nominal, classification error is minimised. If the class attribute is numeric, mean absolute error is minimised


concatStringArray

public static java.lang.String concatStringArray(java.lang.String[] strings)
A little helper to create a single String from an array of Strings

Parameters:
strings - an array of strings

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain command line options (see setOptions)