ir.classifiers
Class NaiveBayes

java.lang.Object
  |
  +--ir.classifiers.Classifier
        |
        +--ir.classifiers.NaiveBayes

public class NaiveBayes
extends Classifier

Implements the NaiveBayes Classifier with Laplace smoothing. Stores probabilities internally as logs to prevent underflow problems.


Field Summary
protected  java.util.Vector Categories
          Vector of categories (classes) in the data
 boolean debug
          Flag for debug prints
protected  double EPSILON
          Small value to be used instead of 0 in probabilities, if Laplace smoothing is not used
protected  boolean isLaplace
          Flag to set Laplace smoothing when estimating probabilities
static java.lang.String name
          Name of classifier
 int numCategories
          Number of categories
 int numExamples
          Number of training examples, set by train function
 int numFeatures
          Number of features
protected  BayesResult trainResult
          Stores the training result, set by the train function
 
Constructor Summary
NaiveBayes(java.lang.String[] categories, boolean d)
          Create an naive bayes classifier with these attributes
 
Method Summary
protected  double[] calculatePriors(java.util.Vector trainExamples)
          Calculates the class priors
protected  double[] calculateProbs(Example testExample)
          Calculates the prob of the testExample being generated by each category
protected  java.util.Hashtable conditionalProbs(java.util.Vector trainExamples)
          Calculates the conditional probs of each feature in the different categories
protected  void displayProbs(double[] classPriors, java.util.Hashtable featureHash)
          Displays the probs for each feature in the different categories
 double getEpsilon()
          Returns value of EPSILON
 boolean getIsLaplace()
          Returns value of isLaplace
 java.lang.String getName()
          Returns the name
 BayesResult getTrainResult()
          Returns training result
 void setCategories(java.lang.String[] categories)
          Set vector of categories (classes) in the data with the input string of categories
 void setDebug(boolean bool)
          Sets the debug flag
 void setEpsilon(double ep)
          Sets the value of EPSILON (default 1e-6)
 void setInvertedIndex(ir.vsr.InvertedIndex index)
          Since NaiveBayes does not use an inverted Index, this function does nothing in the case of NaiveBayes
 void setLaplace(boolean bool)
          Sets the Laplace smoothing flag
 boolean test(Example testExample)
          Categorizes the test example using the trained Naive Bayes classifier, returning true if the predicted category is same as the actual category
 void train(java.util.Vector trainExamples)
          Trains the Naive Bayes classifier - estimates the prior probs and calculates the counts for each feature in different categories
 boolean usesInvertedIndex()
          Function to indicate that this class does not use an inverted index
 
Methods inherited from class ir.classifiers.Classifier
argMax
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

Categories

protected java.util.Vector Categories
Vector of categories (classes) in the data

isLaplace

protected boolean isLaplace
Flag to set Laplace smoothing when estimating probabilities

EPSILON

protected double EPSILON
Small value to be used instead of 0 in probabilities, if Laplace smoothing is not used

trainResult

protected BayesResult trainResult
Stores the training result, set by the train function

name

public static final java.lang.String name
Name of classifier

numCategories

public int numCategories
Number of categories

numFeatures

public int numFeatures
Number of features

numExamples

public int numExamples
Number of training examples, set by train function

debug

public boolean debug
Flag for debug prints
Constructor Detail

NaiveBayes

public NaiveBayes(java.lang.String[] categories,
                  boolean d)
Create an naive bayes classifier with these attributes
Parameters:
cats - The array of Strings containing the category names
d - Flag to turn on detailed output
Method Detail

usesInvertedIndex

public boolean usesInvertedIndex()
Function to indicate that this class does not use an inverted index
Overrides:
usesInvertedIndex in class Classifier

setInvertedIndex

public void setInvertedIndex(ir.vsr.InvertedIndex index)
Since NaiveBayes does not use an inverted Index, this function does nothing in the case of NaiveBayes
Overrides:
setInvertedIndex in class Classifier

setCategories

public void setCategories(java.lang.String[] categories)
Set vector of categories (classes) in the data with the input string of categories

setDebug

public void setDebug(boolean bool)
Sets the debug flag

setLaplace

public void setLaplace(boolean bool)
Sets the Laplace smoothing flag

setEpsilon

public void setEpsilon(double ep)
Sets the value of EPSILON (default 1e-6)

getName

public java.lang.String getName()
Returns the name
Overrides:
getName in class Classifier

getEpsilon

public double getEpsilon()
Returns value of EPSILON

getTrainResult

public BayesResult getTrainResult()
Returns training result

getIsLaplace

public boolean getIsLaplace()
Returns value of isLaplace

train

public void train(java.util.Vector trainExamples)
Trains the Naive Bayes classifier - estimates the prior probs and calculates the counts for each feature in different categories
Overrides:
train in class Classifier
Parameters:
trainExamples - The vector of training examples

test

public boolean test(Example testExample)
Categorizes the test example using the trained Naive Bayes classifier, returning true if the predicted category is same as the actual category
Overrides:
test in class Classifier
Parameters:
testExample - The test example to be categorized

calculatePriors

protected double[] calculatePriors(java.util.Vector trainExamples)
Calculates the class priors
Parameters:
trainExample - The training examples from which class priors will be estimated

conditionalProbs

protected java.util.Hashtable conditionalProbs(java.util.Vector trainExamples)
Calculates the conditional probs of each feature in the different categories
Parameters:
trainExamples - The training examples from which counts will be estimated

calculateProbs

protected double[] calculateProbs(Example testExample)
Calculates the prob of the testExample being generated by each category
Parameters:
testExample - The test example to be categorized

displayProbs

protected void displayProbs(double[] classPriors,
                            java.util.Hashtable featureHash)
Displays the probs for each feature in the different categories
Parameters:
classPriors - Prior probs
featureHash - Feature hashtable after training