ir.classifiers
Class CVLearningCurve

java.lang.Object
  extended by ir.classifiers.CVLearningCurve

public class CVLearningCurve
extends java.lang.Object

Gives learning curves with K-fold cross validation for a classifier.


Field Summary
protected  Classifier classifier
          The classifier for which K-fold CV learning curve has to be generated
protected  boolean debug
          Flag for debug display
protected static double[] DEFAULT_POINTS
          Default points
protected  java.util.Vector<Example>[][] foldBins
          foldBins[i][j] stores the examples for class i in fold j.
protected  int numClasses
          Number of classes in the data
protected  int numFolds
          Number of folds of cross validation to run
protected  double[] points
          Points on the X axis (percentage of train data) to plot
protected  long randomSeed
          Seed for random number generator
protected  PointResults[] testResults
          Accuracy results for test data, one PointResults for each point on the curve
protected  double testTime
          Total Testing time
protected  int testTimeNum
          Total number of examples tested in test time
protected  java.util.Vector<Example>[] totalExamples
          Stores all the examples for each class
protected  int totalNumTrain
          Total number of training examples per fold
protected  PointResults[] trainResults
          Accuracy results for training data, one PointResults for each point on the curve
protected  double trainTime
          Total Training time
 
Constructor Summary
CVLearningCurve(Classifier c, java.util.List<Example> examples)
          Creates a CVLearning curve object with 10 folds and default points
CVLearningCurve(int nfolds, Classifier c, java.util.List<Example> examples, double[] points, long randomSeed, boolean debug)
          Creates a CVLearning curve object
 
Method Summary
 void binExamples()
          Set the fold Bins from the total Examples -- this effectively stores the training-test split
 Classifier getClassifier()
          Return classifier
 java.util.Vector<Example>[][] getFoldBins()
          Return the fold Bins
 java.util.Vector<Example> getTestCV(int foldnum)
          Creates the testing set for one fold of a cross-validation on the dataset.
 java.util.Vector[] getTotalExamples()
          Return all the examples
 java.util.Vector<Example> getTrainCV(int foldnum, double percent)
          Creates the training set for one fold of a cross-validation on the dataset.
 void run()
          Run a CV learning curve test and print total training and test time and generate an averge learning curve plot output files suitable for gunuplot
 void setClassifier(Classifier c)
          Set the classifier
 void setFoldBins(java.util.Vector<Example>[][] bins)
          Set the fold Bins
 void setTotalExamples(java.util.List<Example> examples)
          Sets the totalExamples by partitioning examples into categories to get a stratified sample
 void setTotalExamples(java.util.Vector<Example>[] data)
          Set all the examples
protected  int sizeOfFold(int foldNum)
          Computes the total number of examples in given fold
 void trainAndTest()
          Run training and test for each point to be plotted, gathering a result for each fold.
 void trainAndTestFold(java.util.Vector<Example> train, java.util.Vector<Example> test, int fold, PointResults testPointResults, PointResults trainPointResults)
          Train and test on given example sets for the given fold:
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

totalExamples

protected java.util.Vector<Example>[] totalExamples
Stores all the examples for each class


foldBins

protected java.util.Vector<Example>[][] foldBins
foldBins[i][j] stores the examples for class i in fold j. This stores the training-test splits for all the folds


classifier

protected Classifier classifier
The classifier for which K-fold CV learning curve has to be generated


randomSeed

protected long randomSeed
Seed for random number generator


numClasses

protected int numClasses
Number of classes in the data


totalNumTrain

protected int totalNumTrain
Total number of training examples per fold


numFolds

protected int numFolds
Number of folds of cross validation to run


points

protected double[] points
Points on the X axis (percentage of train data) to plot


DEFAULT_POINTS

protected static double[] DEFAULT_POINTS
Default points


debug

protected boolean debug
Flag for debug display


trainTime

protected double trainTime
Total Training time


testTime

protected double testTime
Total Testing time


testTimeNum

protected int testTimeNum
Total number of examples tested in test time


testResults

protected PointResults[] testResults
Accuracy results for test data, one PointResults for each point on the curve


trainResults

protected PointResults[] trainResults
Accuracy results for training data, one PointResults for each point on the curve

Constructor Detail

CVLearningCurve

public CVLearningCurve(int nfolds,
                       Classifier c,
                       java.util.List<Example> examples,
                       double[] points,
                       long randomSeed,
                       boolean debug)
Creates a CVLearning curve object

Parameters:
nfolds - Number of folds of CV to perform
c - Classifier on which to perform K-fold CV
examples - List of examples.
points - Points (in percentage of full train set) to plot on learning curve
debug - Debugging flag to set verbose trace printing

CVLearningCurve

public CVLearningCurve(Classifier c,
                       java.util.List<Example> examples)
Creates a CVLearning curve object with 10 folds and default points

Parameters:
c - Classifier on which to perform K-fold CV
examples - List of examples.
Method Detail

getClassifier

public Classifier getClassifier()
Return classifier


setClassifier

public void setClassifier(Classifier c)
Set the classifier


getTotalExamples

public java.util.Vector[] getTotalExamples()
Return all the examples


setTotalExamples

public void setTotalExamples(java.util.Vector<Example>[] data)
Set all the examples


getFoldBins

public java.util.Vector<Example>[][] getFoldBins()
Return the fold Bins


setFoldBins

public void setFoldBins(java.util.Vector<Example>[][] bins)
Set the fold Bins


setTotalExamples

public void setTotalExamples(java.util.List<Example> examples)
Sets the totalExamples by partitioning examples into categories to get a stratified sample


run

public void run()
         throws java.lang.Exception
Run a CV learning curve test and print total training and test time and generate an averge learning curve plot output files suitable for gunuplot

Throws:
java.lang.Exception

trainAndTest

public void trainAndTest()
Run training and test for each point to be plotted, gathering a result for each fold.


trainAndTestFold

public void trainAndTestFold(java.util.Vector<Example> train,
                             java.util.Vector<Example> test,
                             int fold,
                             PointResults testPointResults,
                             PointResults trainPointResults)
Train and test on given example sets for the given fold:

Parameters:
train - The training dataset vector
test - The testing dataset vector
fold - The current fold number
testPointResults - train accuracy PointResults for this point
trainPointResults - test accuracy PointResults for this point

binExamples

public void binExamples()
Set the fold Bins from the total Examples -- this effectively stores the training-test split


getTrainCV

public java.util.Vector<Example> getTrainCV(int foldnum,
                                            double percent)
Creates the training set for one fold of a cross-validation on the dataset.

Parameters:
foldnum - The fold for which training set is to be constructed
percent - Percentage of examples to use for training in this fold
Returns:
The training data

sizeOfFold

protected int sizeOfFold(int foldNum)
Computes the total number of examples in given fold


getTestCV

public java.util.Vector<Example> getTestCV(int foldnum)
Creates the testing set for one fold of a cross-validation on the dataset.

Parameters:
foldnum - The fold which is to be used as testing data
Returns:
The test data