ir.eval
Class ExperimentRelFeedback

java.lang.Object
  |
  +--ir.eval.Experiment
        |
        +--ir.eval.ExperimentRelFeedback

public class ExperimentRelFeedback
extends Experiment

A specialization of Experiment for evaluating relevance feedback. This requires, first determining the top ranked retrievals, then simulating relevance feedback using known relevant docs from the test data. However, performance must then be evaluated on only documents for which explicit relevance information was NOT provided.


Fields inherited from class ir.eval.Experiment
corpusDir, outFile, queryFile, RECALL_LEVELS
 
Constructor Summary
ExperimentRelFeedback(java.io.File corpusDir, java.io.File queryFile, java.io.File outFile, short docType, boolean stem, int numFeedback, boolean feedback, int numSkip)
          Create an experiment for relevance feedback with the given parameters
 
Method Summary
static void main(java.lang.String[] args)
          Evaluate retrieval preformance on a given query test corpus and generate a recall/precision graph.
 
Methods inherited from class ir.eval.Experiment
makeRpCurve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExperimentRelFeedback

public ExperimentRelFeedback(java.io.File corpusDir,
                             java.io.File queryFile,
                             java.io.File outFile,
                             short docType,
                             boolean stem,
                             int numFeedback,
                             boolean feedback,
                             int numSkip)
                      throws java.io.IOException
Create an experiment for relevance feedback with the given parameters
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Evaluate retrieval preformance on a given query test corpus and generate a recall/precision graph. Command format: "Experiment [OPTION]* [DIR] [QUERIES] [OUTFILE] [FEEDBACKNUM]" where: DIR is the name of the directory whose files should be indexed. QUERIES is a file of queries paired with relevant docs (see queryFile). OUTFILE is the name of the file to put the output. The plot data for the recall precision curve is stored in this file and a gnuplot file for the graph is the same name with a ".gplot" extension. FEEDBACKNUM is the number of top retrievals to simulate feedback for. OPTIONs can be "-html" to specify HTML files whose HTML tags should be removed, and "-stem" to specify tokens should be stemmed with Porter stemmer. "-control" to not actually do the feedback but evaluate performance for no feedback on the same test docs (eliminating the feeback examples) "-skip N" where N is an int, to skip the top N ranked docs when evaluating final performance. By default, the number skipped is just FEEDBACKNUM, but N can be greater than FEEDBACKNUM if want to evaluate the performance of different levels of feedback on the same test docs, using the maximum FEEDBACKNUM as the N for -skip (numSkip) for different runs with different FEEDBACKNUMs. Note that "-skip N" with FEEDBACKNUM = 0 is the same as "-control" with FEEDBACKNUM = N