weka.classifiers.sparse
Class NaiveBayesSimpleSparse

java.lang.Object
  extended byweka.classifiers.Classifier
      extended byweka.classifiers.DistributionClassifier
          extended byweka.classifiers.sparse.NaiveBayesSimpleSparse
All Implemented Interfaces:
java.lang.Cloneable, OptionHandler, java.io.Serializable, WeightedInstancesHandler
Direct Known Subclasses:
NaiveBayesSimpleSparseSoft

public class NaiveBayesSimpleSparse
extends DistributionClassifier
implements OptionHandler, WeightedInstancesHandler

Class for building and using a simple Naive Bayes classifier that is adapted for Sparse Instances assuming attribute values are counts of the presence of a descriptive token (e.g. frequency of a word in text categorization) and assuming a multinomial model for generation of examples/documents. See: T. Mitchell, Machine Learning, McGraw Hill, 1997, section 6.9 & 6.10 and/or Andrew McCallum and Kamal Nigam, "A Comparison of Event Models for Naive Bayes Text Classification", Papers from the AAAI-98 Workshop on Text Categorization, 1998, pp. 41--48

See Also:
Serialized Form

Field Summary
protected  int m_classIndex
          Attribute index for class attribute
protected  double[][] m_condProbs
          Conditional probabilities of each attribute given each class
protected  boolean m_debug
          A debug flag
protected  Instances m_instances
          The instances used for training.
protected  double m_m
          m parameter for Laplace m estimate, corresponding to size of pseudosample
protected  int m_numAttributes
          The total number of features
protected  int m_numClasses
          The number of classes
protected  double[] m_priors
          The prior probabilities of the classes.
 
Constructor Summary
NaiveBayesSimpleSparse()
           
 
Method Summary
 void buildClassifier(Instances instances)
          Generates the classifier.
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
 double getM()
          Get Laplace m parameter that controls amouont of smoothing
 java.lang.String[] getOptions()
          Gets the current settings of NaiveBayesSimpleSparse.
 java.lang.String globalInfo()
           
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String mTipText()
           
 void setM(double m)
          Set Laplace m parameter that controls amouont of smoothing
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
          Returns a description of the classifier.
 double[] unNormalizedDistributionForInstance(Instance _instance)
          Calculates the class membership probabilities for the given test instance.
 
Methods inherited from class weka.classifiers.DistributionClassifier
calculateEntropy, calculateLabeledInstanceMargin, calculateMargin, classifyInstance
 
Methods inherited from class weka.classifiers.Classifier
forName, makeCopies
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_priors

protected double[] m_priors
The prior probabilities of the classes.


m_condProbs

protected double[][] m_condProbs
Conditional probabilities of each attribute given each class


m_instances

protected Instances m_instances
The instances used for training.


m_numClasses

protected int m_numClasses
The number of classes


m_classIndex

protected int m_classIndex
Attribute index for class attribute


m_numAttributes

protected int m_numAttributes
The total number of features


m_m

protected double m_m
m parameter for Laplace m estimate, corresponding to size of pseudosample


m_debug

protected boolean m_debug
A debug flag

Constructor Detail

NaiveBayesSimpleSparse

public NaiveBayesSimpleSparse()
Method Detail

buildClassifier

public void buildClassifier(Instances instances)
                     throws java.lang.Exception
Generates the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
instances - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been generated successfully

unNormalizedDistributionForInstance

public double[] unNormalizedDistributionForInstance(Instance _instance)
                                             throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed or if the instance is not a SparseInstance

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Specified by:
distributionForInstance in class DistributionClassifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if distribution can't be computed

getM

public double getM()
Get Laplace m parameter that controls amouont of smoothing


setM

public void setM(double m)
Set Laplace m parameter that controls amouont of smoothing


mTipText

public java.lang.String mTipText()

globalInfo

public java.lang.String globalInfo()

toString

public java.lang.String toString()
Returns a description of the classifier.

Returns:
a description of the classifier as a string.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-M num
Set amount of Laplace m estimate smoothing (size of pseudo sample)

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of NaiveBayesSimpleSparse.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options