Class CVLearningCurve

  extended by ir.classifiers.CVLearningCurve

public class CVLearningCurve
extends java.lang.Object

Gives learning curves with K-fold cross validation for a classifier.

Field Summary
protected  Classifier classifier
          The classifier for which K-fold CV learning curve has to be generated
protected  boolean debug
          Flag for debug display
protected static double[] DEFAULT_POINTS
          Default points
protected  java.util.Vector<Example>[][] foldBins
          foldBins[i][j] stores the examples for class i in fold j.
protected  int numClasses
          Number of classes in the data
protected  int numFolds
          Number of folds of cross validation to run
protected  double[] points
          Points on the X axis (percentage of train data) to plot
protected  long randomSeed
          Seed for random number generator
protected  PointResults[] testResults
          Accuracy results for test data, one PointResults for each point on the curve
protected  double testTime
          Total Testing time
protected  int testTimeNum
          Total number of examples tested in test time
protected  java.util.Vector<Example>[] totalExamples
          Stores all the examples for each class
protected  int totalNumTrain
          Total number of training examples per fold
protected  PointResults[] trainResults
          Accuracy results for training data, one PointResults for each point on the curve
protected  double trainTime
          Total Training time
Constructor Summary
CVLearningCurve(Classifier c, java.util.List<Example> examples)
          Creates a CVLearning curve object with 10 folds and default points
CVLearningCurve(int nfolds, Classifier c, java.util.List<Example> examples, double[] points, long randomSeed, boolean debug)
          Creates a CVLearning curve object
Method Summary
 void binExamples()
          Set the fold Bins from the total Examples -- this effectively stores the training-test split
 Classifier getClassifier()
          Return classifier
 java.util.Vector<Example>[][] getFoldBins()
          Return the fold Bins
 java.util.Vector<Example> getTestCV(int foldnum)
          Creates the testing set for one fold of a cross-validation on the dataset.
 java.util.Vector[] getTotalExamples()
          Return all the examples
 java.util.Vector<Example> getTrainCV(int foldnum, double percent)
          Creates the training set for one fold of a cross-validation on the dataset.
 void run()
          Run a CV learning curve test and print total training and test time and generate an averge learning curve plot output files suitable for gunuplot
 void setClassifier(Classifier c)
          Set the classifier
 void setFoldBins(java.util.Vector<Example>[][] bins)
          Set the fold Bins
 void setTotalExamples(java.util.List<Example> examples)
          Sets the totalExamples by partitioning examples into categories to get a stratified sample
 void setTotalExamples(java.util.Vector<Example>[] data)
          Set all the examples
protected  int sizeOfFold(int foldNum)
          Computes the total number of examples in given fold
 void trainAndTest()
          Run training and test for each point to be plotted, gathering a result for each fold.
 void trainAndTestFold(java.util.Vector<Example> train, java.util.Vector<Example> test, int fold, PointResults testPointResults, PointResults trainPointResults)
          Train and test on given example sets for the given fold:
Field Detail


protected java.util.Vector<Example>[] totalExamples
Stores all the examples for each class


protected java.util.Vector<Example>[][] foldBins
foldBins[i][j] stores the examples for class i in fold j. This stores the training-test splits for all the folds


protected Classifier classifier
The classifier for which K-fold CV learning curve has to be generated


protected long randomSeed
Seed for random number generator


protected int numClasses
Number of classes in the data


protected int totalNumTrain
Total number of training examples per fold


protected int numFolds
Number of folds of cross validation to run


protected double[] points
Points on the X axis (percentage of train data) to plot


protected static double[] DEFAULT_POINTS
Default points


protected boolean debug
Flag for debug display


protected double trainTime
Total Training time


protected double testTime
Total Testing time


protected int testTimeNum
Total number of examples tested in test time


protected PointResults[] testResults
Accuracy results for test data, one PointResults for each point on the curve


protected PointResults[] trainResults
Accuracy results for training data, one PointResults for each point on the curve

Constructor Detail


public CVLearningCurve(int nfolds,
                       Classifier c,
                       java.util.List<Example> examples,
                       double[] points,
                       long randomSeed,
                       boolean debug)
Creates a CVLearning curve object

nfolds - Number of folds of CV to perform
c - Classifier on which to perform K-fold CV
examples - List of examples.
points - Points (in percentage of full train set) to plot on learning curve
debug - Debugging flag to set verbose trace printing


public CVLearningCurve(Classifier c,
                       java.util.List<Example> examples)
Creates a CVLearning curve object with 10 folds and default points

c - Classifier on which to perform K-fold CV
examples - List of examples.
Method Detail


public Classifier getClassifier()
Return classifier


public void setClassifier(Classifier c)
Set the classifier


public java.util.Vector[] getTotalExamples()
Return all the examples


public void setTotalExamples(java.util.Vector<Example>[] data)
Set all the examples


public java.util.Vector<Example>[][] getFoldBins()
Return the fold Bins


public void setFoldBins(java.util.Vector<Example>[][] bins)
Set the fold Bins


public void setTotalExamples(java.util.List<Example> examples)
Sets the totalExamples by partitioning examples into categories to get a stratified sample


public void run()
         throws java.lang.Exception
Run a CV learning curve test and print total training and test time and generate an averge learning curve plot output files suitable for gunuplot



public void trainAndTest()
Run training and test for each point to be plotted, gathering a result for each fold.


public void trainAndTestFold(java.util.Vector<Example> train,
                             java.util.Vector<Example> test,
                             int fold,
                             PointResults testPointResults,
                             PointResults trainPointResults)
Train and test on given example sets for the given fold:

train - The training dataset vector
test - The testing dataset vector
fold - The current fold number
testPointResults - train accuracy PointResults for this point
trainPointResults - test accuracy PointResults for this point


public void binExamples()
Set the fold Bins from the total Examples -- this effectively stores the training-test split


public java.util.Vector<Example> getTrainCV(int foldnum,
                                            double percent)
Creates the training set for one fold of a cross-validation on the dataset.

foldnum - The fold for which training set is to be constructed
percent - Percentage of examples to use for training in this fold
The training data


protected int sizeOfFold(int foldNum)
Computes the total number of examples in given fold


public java.util.Vector<Example> getTestCV(int foldnum)
Creates the testing set for one fold of a cross-validation on the dataset.

foldnum - The fold which is to be used as testing data
The test data