ir.classifiers
Class CVLearningCurve

java.lang.Object
  |
  +--ir.classifiers.CVLearningCurve

public class CVLearningCurve
extends java.lang.Object

Gives learning curves with K-fold cross validation for a classifier.


Field Summary
static java.lang.String[] CLASSES
          Stores the possible class labels
protected  Classifier classifier
          The classifier for which K-fold CV learning curve has to be generated
protected  boolean debug
          Flag for debug display
protected  java.util.Vector[][] foldBins
          foldBins[i][j] stores the examples for class i in fold j.
protected  int numClasses
          Number of classes in the data
protected  int numFolds
          Number of folds of cross validation to run
static double[] POINTS
          Points at which the learning curve is plotted
 double testTime
          Total Testing time
protected  java.util.Vector[] totalExamples
          Stores all the examples for each class
 double trainTime
          Total Training time
 
Method Summary
 void binExamples()
          Set the fold Bins from the total Examples -- this effectively stores the training-test split
static int findClassID(java.lang.String name)
          Finds the class ID from the name of the document
 Classifier getClassifier()
          Return classifier
 java.util.Vector getCVPredictions()
          Generate a vector of predictions ready for processing, by performing a cross-validation on the supplied dataset.
 java.util.Vector[][] getFoldBins()
          Return the fold Bins
 java.util.Vector getTestCV(int foldnum)
          Creates the testing set for one fold of a cross-validation on the dataset.
 java.util.Vector getTestPrediction(java.util.Vector train, java.util.Vector test)
          Generate a prediction vector by performing an evaluation on the test set after training on the given training set.
 java.util.Vector[] getTotalExamples()
          Return all the examples
 java.util.Vector getTrainCV(int foldnum, double percent)
          Creates the training set for one fold of a cross-validation on the dataset.
 void setClassifier(Classifier c)
          Set the classifier
 void setFoldBins(java.util.Vector[][] bins)
          Set the fold Bins
 void setTotalExamples(java.lang.String dirName)
          Sets the totalExamples by reading in file from directory dirName
 void setTotalExamples(java.util.Vector[] data)
          Set all the examples
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CLASSES

public static final java.lang.String[] CLASSES
Stores the possible class labels

POINTS

public static final double[] POINTS
Points at which the learning curve is plotted

totalExamples

protected java.util.Vector[] totalExamples
Stores all the examples for each class

foldBins

protected java.util.Vector[][] foldBins
foldBins[i][j] stores the examples for class i in fold j. This stores the training-test splits for all the folds

classifier

protected Classifier classifier
The classifier for which K-fold CV learning curve has to be generated

numClasses

protected int numClasses
Number of classes in the data

numFolds

protected int numFolds
Number of folds of cross validation to run

debug

protected boolean debug
Flag for debug display

trainTime

public double trainTime
Total Training time

testTime

public double testTime
Total Testing time
Method Detail

getClassifier

public Classifier getClassifier()
Return classifier

setClassifier

public void setClassifier(Classifier c)
Set the classifier

getTotalExamples

public java.util.Vector[] getTotalExamples()
Return all the examples

setTotalExamples

public void setTotalExamples(java.util.Vector[] data)
Set all the examples

getFoldBins

public java.util.Vector[][] getFoldBins()
Return the fold Bins

setFoldBins

public void setFoldBins(java.util.Vector[][] bins)
Set the fold Bins

setTotalExamples

public void setTotalExamples(java.lang.String dirName)
Sets the totalExamples by reading in file from directory dirName

getCVPredictions

public java.util.Vector getCVPredictions()
Generate a vector of predictions ready for processing, by performing a cross-validation on the supplied dataset.
Returns:
The vector of predictions to be processed for plot generation Each component in the prediction vector is a vector of K [training set size, accuracy] values, corresponding to the K accuracy measurements at the point "size" on the learning curve

getTestPrediction

public java.util.Vector getTestPrediction(java.util.Vector train,
                                          java.util.Vector test)
Generate a prediction vector by performing an evaluation on the test set after training on the given training set.
Parameters:
train - The training dataset vector
test - The testing dataset vector
Returns:
The vector of prediction on test data, each prediction vector containing K [training set size, accuracy] vectors for each fold

binExamples

public void binExamples()
Set the fold Bins from the total Examples -- this effectively stores the training-test split

getTrainCV

public java.util.Vector getTrainCV(int foldnum,
                                   double percent)
Creates the training set for one fold of a cross-validation on the dataset.
Parameters:
foldnum - The fold for which training set is to be constructed
percent - Percentage of examples to use for training in this fold
Returns:
The training data

getTestCV

public java.util.Vector getTestCV(int foldnum)
Creates the testing set for one fold of a cross-validation on the dataset.
Parameters:
foldnum - The fold which is to be used as testing data
Returns:
The test data

findClassID

public static int findClassID(java.lang.String name)
Finds the class ID from the name of the document