ir.eval
Class ExperimentRelFeedback
java.lang.Object
|
+--ir.eval.Experiment
|
+--ir.eval.ExperimentRelFeedback
- public class ExperimentRelFeedback
- extends Experiment
A specialization of Experiment for evaluating relevance feedback.
This requires, first determining the top ranked retrievals, then simulating
relevance feedback using known relevant docs from the test data. However,
performance must then be evaluated on only documents for which explicit
relevance information was NOT provided.
Constructor Summary |
ExperimentRelFeedback(java.io.File corpusDir,
java.io.File queryFile,
java.io.File outFile,
short docType,
boolean stem,
int numFeedback,
boolean feedback,
int numSkip)
Create an experiment for relevance feedback with the given parameters |
Method Summary |
static void |
main(java.lang.String[] args)
Evaluate retrieval preformance on a given query test corpus and
generate a recall/precision graph. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ExperimentRelFeedback
public ExperimentRelFeedback(java.io.File corpusDir,
java.io.File queryFile,
java.io.File outFile,
short docType,
boolean stem,
int numFeedback,
boolean feedback,
int numSkip)
throws java.io.IOException
- Create an experiment for relevance feedback with the given parameters
main
public static void main(java.lang.String[] args)
throws java.io.IOException
- Evaluate retrieval preformance on a given query test corpus and
generate a recall/precision graph.
Command format: "Experiment [OPTION]* [DIR] [QUERIES] [OUTFILE] [FEEDBACKNUM]" where:
DIR is the name of the directory whose files should be indexed.
QUERIES is a file of queries paired with relevant docs (see queryFile).
OUTFILE is the name of the file to put the output. The plot
data for the recall precision curve is stored in this file and a
gnuplot file for the graph is the same name with a ".gplot" extension.
FEEDBACKNUM is the number of top retrievals to simulate feedback for.
OPTIONs can be
"-html" to specify HTML files whose HTML tags should be removed, and
"-stem" to specify tokens should be stemmed with Porter stemmer.
"-control" to not actually do the feedback but evaluate performance
for no feedback on the same test docs (eliminating the feeback examples)
"-skip N" where N is an int, to skip the top N ranked docs when
evaluating final performance. By default, the number skipped is just
FEEDBACKNUM, but N can be greater than FEEDBACKNUM if want to
evaluate the performance of different levels of feedback on the
same test docs, using the maximum FEEDBACKNUM as the N for -skip (numSkip)
for different runs with different FEEDBACKNUMs.
Note that "-skip N" with FEEDBACKNUM = 0 is the same as
"-control" with FEEDBACKNUM = N