weka.classifiers.misc
Class PrototypeMetric

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.DistributionClassifier
          extended byweka.classifiers.misc.PrototypeMetric
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable

public class PrototypeMetric
extends DistributionClassifier
implements OptionHandler

Prototype learner for purely real-valued instances that uses a general weka.core.metrics.Metric. Computes an average/mean/prototype vector for each class. New examples are classified based on computing distance from the instance feature vector to the closest prototype using this Metric. By defaults acts as Rocchio-style classifier that uses cosine similarity Assuming text data arff file is already TFIDF weighted. For example see: Joachims, Thorsten, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Proceedings of International Conference on Machine Learning (ICML), 1997.

See Also:
Serialized Form

Field Summary
protected  Instances m_Instances
          The instances used for training.
protected  Metric m_Metric
          Metric to be used to compare intances to prototype instance
protected  Instance[] m_Prototypes
          Prototype instance for each class
 
Constructor Summary
PrototypeMetric()
           
 
Method Summary
 void buildClassifier(Instances instances)
          Generates the classifier.
 Instances[] classPartitionInstances(Instances instances)
          Partition instances into a set for each class
static java.lang.String concatStringArray(java.lang.String[] strings)
          A little helper to create a single String from an array of Strings
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
 Metric getMetric()
          Get the distance metric
 java.lang.String[] getOptions()
          Gets the current settings.
 java.lang.String globalInfo()
          Returns a string describing this clusterer
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options..
static void main(java.lang.String[] argv)
          Main method for testing this class.
 Instance meanInstance(Instances instances)
          Compute a mean instance for all the instances in a set
protected  double[] meanVectorFull(Instances instances)
          Compute mean vector for non-sparse instances using meanOrMode method on Instances
protected  double[] meanVectorSparse(Instances instances)
          Efficiently compute a mean vector for a set of sparse instances
 void setMetric(Metric m)
          Set the distance metric
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
          Returns a description of the classifier.
 
Methods inherited from class weka.classifiers.DistributionClassifier
calculateEntropy, calculateLabeledInstanceMargin, calculateMargin, classifyInstance
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_Metric

protected Metric m_Metric
Metric to be used to compare intances to prototype instance


m_Prototypes

protected Instance[] m_Prototypes
Prototype instance for each class


m_Instances

protected Instances m_Instances
The instances used for training.

Constructor Detail

PrototypeMetric

public PrototypeMetric()
Method Detail

setMetric

public void setMetric(Metric m)
Set the distance metric


getMetric

public Metric getMetric()
Get the distance metric


listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options..

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

globalInfo

public java.lang.String globalInfo()
Returns a string describing this clusterer

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

buildClassifier

public void buildClassifier(Instances instances)
                     throws java.lang.Exception
Generates the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
instances - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been generated successfully

classPartitionInstances

public Instances[] classPartitionInstances(Instances instances)
Partition instances into a set for each class


meanInstance

public Instance meanInstance(Instances instances)
Compute a mean instance for all the instances in a set


meanVectorFull

protected double[] meanVectorFull(Instances instances)
Compute mean vector for non-sparse instances using meanOrMode method on Instances


meanVectorSparse

protected double[] meanVectorSparse(Instances instances)
Efficiently compute a mean vector for a set of sparse instances


distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Specified by:
distributionForInstance in class DistributionClassifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed

concatStringArray

public static java.lang.String concatStringArray(java.lang.String[] strings)
A little helper to create a single String from an array of Strings

Parameters:
strings - an array of strings

toString

public java.lang.String toString()
Returns a description of the classifier.

Returns:
a description of the classifier as a string.

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options