|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.experiment.DedupingPRCurveCVResultProducerSplit
N-fold cross-validation learning curve for deduping applications
Field Summary | |
static java.lang.String |
DATASET_FIELD_NAME
|
static int |
FOLD_CREATION_MODE_RANDOM
|
static int |
FOLD_CREATION_MODE_STRATIFIED
SVM-light can work in classification, regression and preference ranking modes |
static java.lang.String |
FOLD_FIELD_NAME
|
protected java.lang.String[] |
m_additionalMeasures
The names of any additional measures to look for in SplitEvaluators |
protected boolean |
m_debugOutput
Save raw output of split evaluators --- for debugging purposes |
protected int |
m_foldCreationMode
|
protected Instances |
m_instances
The dataset of interest |
protected int |
m_numFolds
The number of folds in the cross-validation |
protected java.io.File |
m_outputFile
The destination output file/directory for raw output |
protected double[] |
m_plotPoints
The specific points to plot, either integers representing specific numbers of training examples, or decimal fractions representing percentages of the full training set |
protected ResultListener |
m_resultListener
The ResultListener to send results to |
protected java.lang.String |
m_separateTrainingFile
The separate training file if desired |
protected SplitEvaluator |
m_splitEvaluator
The SplitEvaluator used to generate results |
protected OutputZipper |
m_zipDest
The output zipper to use for saving raw splitEvaluator output |
static java.lang.String |
RECALL_FIELD_NAME
|
static java.lang.String |
RUN_FIELD_NAME
|
static Tag[] |
TAGS_FOLD_CREATION_MODE
|
static java.lang.String |
TIMESTAMP_FIELD_NAME
|
Constructor Summary | |
DedupingPRCurveCVResultProducerSplit()
|
Method Summary | |
void |
doRun(int run)
Gets the results for a specified run number. |
void |
doRunKeys(int run)
Gets the keys for a specified run number. |
java.util.Enumeration |
enumerateMeasures()
Returns an enumeration of any additional measure names that might be in the SplitEvaluator |
java.lang.String |
getCompatibilityState()
Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface). |
SelectedTag |
getFoldCreationMode()
return the fold creation mode |
java.lang.String[] |
getKeyNames()
Gets the names of each of the columns produced for a single run. |
java.lang.Object[] |
getKeyTypes()
Gets the data types of each of the columns produced for a single run. |
double |
getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure |
int |
getNumFolds()
Get the value of NumFolds. |
java.lang.String[] |
getOptions()
Gets the current settings of the result producer. |
java.io.File |
getOutputFile()
Get the value of OutputFile. |
java.lang.String |
getPlotPoints()
Get the value of PlotPoints. |
boolean |
getRawOutput()
Get if raw split evaluator output is to be saved |
java.lang.String[] |
getResultNames()
Gets the names of each of the columns produced for a single run. |
java.lang.Object[] |
getResultTypes()
Gets the data types of each of the columns produced for a single run. |
java.lang.String |
getSeparateTrainingFile()
Get the value of separate training file |
SplitEvaluator |
getSplitEvaluator()
Get the SplitEvaluator. |
static java.lang.Double |
getTimestamp()
Gets a Double representing the current date and time. |
protected Instances |
getTrainingFold(java.util.ArrayList foldList,
int testFoldIdx)
Given a list of folds, merge together all but the test fold with the specified index and return the resulting training fold |
java.lang.String |
globalInfo()
Returns a string describing this result producer |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options.. |
static void |
main(java.lang.String[] args)
|
java.lang.String |
numFoldsTipText()
Returns the tip text for this property |
java.lang.String |
outputFileTipText()
Returns the tip text for this property |
protected double[] |
parsePlotPoints(java.lang.String plotPoints)
Parse a string of doubles separated by commas or spaces into a sorted array of doubles |
java.lang.String |
plotPointsTipText()
Returns the tip text for this property |
void |
postProcess()
Perform any postprocessing. |
void |
preProcess()
Prepare to generate results. |
protected java.lang.Object[] |
processResults(java.lang.Object[] prResults,
double recallLevel)
Given an array containing the overall results of a deduping experiment, produce an array containing results for a specific recall level |
java.lang.String |
rawOutputTipText()
Returns the tip text for this property |
void |
setAdditionalMeasures(java.lang.String[] additionalMeasures)
Set a list of method names for additional measures to look for in SplitEvaluators. |
void |
setFoldCreationMode(SelectedTag mode)
Set the mode of creating folds |
void |
setInstances(Instances instances)
Sets the dataset that results will be obtained for. |
void |
setNumFolds(int newNumFolds)
Set the value of NumFolds. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setOutputFile(java.io.File newOutputFile)
Set the value of OutputFile. |
void |
setPlotPoints(java.lang.String plotPoints)
Set the value of PlotPoints. |
void |
setRawOutput(boolean d)
Set to true if raw split evaluator output is to be saved |
void |
setResultListener(ResultListener listener)
Sets the object to send results of each run to. |
void |
setSeparateTrainingFile(java.lang.String separateTrainingFile)
Set the value of separate training file |
void |
setSplitEvaluator(SplitEvaluator newSplitEvaluator)
Set the SplitEvaluator. |
java.lang.String |
splitEvaluatorTipText()
Returns the tip text for this property |
java.lang.String |
toString()
Gets a text descrption of the result producer. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
protected Instances m_instances
public static final int FOLD_CREATION_MODE_STRATIFIED
public static final int FOLD_CREATION_MODE_RANDOM
public static final Tag[] TAGS_FOLD_CREATION_MODE
protected int m_foldCreationMode
protected ResultListener m_resultListener
protected int m_numFolds
protected boolean m_debugOutput
protected OutputZipper m_zipDest
protected java.io.File m_outputFile
protected java.lang.String m_separateTrainingFile
protected SplitEvaluator m_splitEvaluator
protected java.lang.String[] m_additionalMeasures
protected double[] m_plotPoints
public static java.lang.String DATASET_FIELD_NAME
public static java.lang.String RUN_FIELD_NAME
public static java.lang.String FOLD_FIELD_NAME
public static java.lang.String TIMESTAMP_FIELD_NAME
public static java.lang.String RECALL_FIELD_NAME
Constructor Detail |
public DedupingPRCurveCVResultProducerSplit()
Method Detail |
public java.lang.String globalInfo()
public void setInstances(Instances instances)
setInstances
in interface ResultProducer
instances
- a value of type 'Instances'.public void setResultListener(ResultListener listener)
setResultListener
in interface ResultProducer
listener
- a value of type 'ResultListener'public void setAdditionalMeasures(java.lang.String[] additionalMeasures)
setAdditionalMeasures
in interface ResultProducer
additionalMeasures
- an array of measure names, null if nonepublic java.util.Enumeration enumerateMeasures()
enumerateMeasures
in interface AdditionalMeasureProducer
public double getMeasure(java.lang.String additionalMeasureName)
getMeasure
in interface AdditionalMeasureProducer
additionalMeasureName
- the name of the measure to query for its value
java.lang.IllegalArgumentException
- if the named measure is not supportedpublic static java.lang.Double getTimestamp()
public void preProcess() throws java.lang.Exception
preProcess
in interface ResultProducer
java.lang.Exception
- if an error occurs during preprocessing.public void postProcess() throws java.lang.Exception
postProcess
in interface ResultProducer
java.lang.Exception
- if an error occurspublic void doRunKeys(int run) throws java.lang.Exception
doRunKeys
in interface ResultProducer
run
- the run number to get keys for.
java.lang.Exception
- if a problem occurs while getting the keyspublic void doRun(int run) throws java.lang.Exception
doRun
in interface ResultProducer
run
- the run number to get results for.
java.lang.Exception
- if a problem occurs while getting the resultsprotected Instances getTrainingFold(java.util.ArrayList foldList, int testFoldIdx)
foldList
- a list containg foldstestFoldIdx
- the index of the fold that will be used for testing
protected java.lang.Object[] processResults(java.lang.Object[] prResults, double recallLevel)
public java.lang.String[] getKeyNames()
getKeyNames
in interface ResultProducer
public java.lang.Object[] getKeyTypes()
getKeyTypes
in interface ResultProducer
public java.lang.String[] getResultNames()
getResultNames
in interface ResultProducer
public java.lang.Object[] getResultTypes()
getResultTypes
in interface ResultProducer
public java.lang.String getCompatibilityState()
getCompatibilityState
in interface ResultProducer
public java.lang.String outputFileTipText()
public java.io.File getOutputFile()
public void setOutputFile(java.io.File newOutputFile)
newOutputFile
- Value to assign to OutputFile.public java.lang.String getSeparateTrainingFile()
public void setSeparateTrainingFile(java.lang.String separateTrainingFile)
separateTrainingFile
- Value to assign to separate training filepublic java.lang.String numFoldsTipText()
public int getNumFolds()
public void setNumFolds(int newNumFolds)
newNumFolds
- Value to assign to NumFolds.public java.lang.String plotPointsTipText()
public java.lang.String getPlotPoints()
public void setPlotPoints(java.lang.String plotPoints)
plotPoints
- Value to assign to
PlotPoints.protected double[] parsePlotPoints(java.lang.String plotPoints)
public java.lang.String rawOutputTipText()
public boolean getRawOutput()
public void setRawOutput(boolean d)
d
- true if output is to be savedpublic void setFoldCreationMode(SelectedTag mode)
mode
- stratified or randompublic SelectedTag getFoldCreationMode()
public java.lang.String splitEvaluatorTipText()
public SplitEvaluator getSplitEvaluator()
public void setSplitEvaluator(SplitEvaluator newSplitEvaluator)
newSplitEvaluator
- new SplitEvaluator to use.public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-X num_folds
The number of folds to use for the cross-validation.
-D
Specify that raw split evaluator output is to be saved.
-O file/directory name
Specify the file or directory to which raw split evaluator output
is to be saved. If a directory is specified, then each output string
is saved as an individual gzip file. If a file is specified, then
each output string is saved as an entry in a zip file.
-W classname
Specify the full class name of the split evaluator.
All option after -- will be passed to the split evaluator.
setOptions
in interface OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
public java.lang.String toString()
public static void main(java.lang.String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |