/u/mooney/ir-code/ir/eval/
. See the Javadoc for this system. Use
the main
method for ExperimentRated to
index a set of documents, then process queries, evaluate the results compared
to known relevant documents, and finally generate a recall-precision curve
and an NDCG plot.
You can use the documents in the Cystic-Fibrosis (CF) corpus
(/u/mooney/ir-code/corpora/cf/
) as a set of test documents. This
corpus contains 1,239 "documents" (actually just medical article title and
abstracts). A set of 100 queries with the correct documents determined to be
relevant to these queries is in
/u/mooney/ir-code/queries/cf/queries
.
ExperimentRated can be used to produce recall-precision
curves and NDCG results for this document/query corpus.
The NDCG results are based off continuous relevance ratings.
Our CF data actually comes with ratings on a 3-level scale
(0:not relevant, 1:marginally relevant, 2:very relevant) from 4 judges. In
order to produce a single relevance rating, I averaged the scores of the 4
judges and scaled the result to produce a real-valued rating between 0 and 1.
The rated query file in /u/mooney/ir-code/queries/cf/queries-rated
has the results. For each query, each relevant document is followed by a 0-1
relevance rating. Here is a trace of
running such an experiment. The program also generates as output a ".gplot"
and a ".ndcg.gplot"
file that gnuplot can
use to generate a recall-precision graph (plot) such as this and an NDCG graph (plot)
such as this. To create a pdf plot file
execute the following command:
gnuplot filename.gplot | ps2pdf - filename.pdf
The gnuplot
command creates a postscript (*.ps) file and this output
is directly piped into the ps2pdf
command (Note the "-") which then produces
a pdf filename.pdf
.
A set of sample results files that I generated for the CF data are in
/u/mooney/ir-code/results/cf/
.
You can also edit the ".gplot" files yourself to create graphs combining the results of multiple runs of ExperimentRated (such as with this ".gplot" file and resulting pdf plot file) in order to compare different methods.
Code for performing relevance feedback is included in the VSR system. See ir.vsr.Feedback class and the Javadoc documentation for it. It is invoked by using the "-feedback" flag when ir.vsr.InvertedIndex is run. After viewing a retrieved document, the user is asked to rate it as either relevant or irrelevant to the query. Then, by using the "r" (redo) command, this feedback will be used to revise the query vector (using the Ide_Regular method), which is then used to produce a new set of retrievals.
One problem with relevance feedback is that it comes at an expense to the user, who must spend her time on rating the initial retrieval results. A possible solution to this problem is using pseudo relevance feedback: just assuming that the top m retrieved documents are relevant, and using them to reformulate the query.
An important question that can be addressed experimentally is: what is the effect of pseudo relevance feedback on retrieval results?
Your assignment is to modify ir.vsr.Feedback class to allow pseudo relevance feedback. Current interactive feedback capabilities should be preserved; the pseudo capability can be achieved by adding the new capability to ir.vsr.Feedback along with a way to distinguish between the interactive and pseudo feedback modes.
A new flag "-pseudofeedback" should be implemented in ir.vsr.InvertedIndex and
ir.eval.ExperimentRated classes, which would be followed by the integer value of m,
e.g.
"java ir.vsr.InvertedIndex -html -pseudofeedback 8
/u/mooney/ir-code/corpora/yahoo-science/".
You will also need to add necessary
parameters and make corresponding changes in the constructors and the code that
handles feedback in these classes.
A flag "-feedbackparams" should also be implemented in these classes, which
would be followed by three floating-point values for ALPHA, BETA and GAMMA
feedback parameters (default is 1.0 for each of these), e.g.
java ir.eval.ExperimentRated -pseudofeedback 5
-feedbackparams 1.0 0.5 1.0 /u/mooney/ir-code/corpora/cf/
/u/mooney/ir-code/queries/cf/queries-rated /u/mooney/ir-results/5beta05
You will need to change the classes ir.vsr.Feedback, ir.vsr.InvertedIndex,
ir.eval.ExperimentRated, and ir.eval.Experiment.
You will then use this code to produce recall-precision and NDCG curves that evaluate the effect of different amounts of pseudo relevance feedback on retrieval performance on the CF corpus.
Try the following amounts of pseudo relevance feedback (values of m): {0, 1, 2, 5, 10, 15, 30} (using BETA=0.1 and the other feedback parameters as default values). This should generate 7 different recall-precision plots. You should manually combine these into one gnuplot file and final performance graph that compares recall-precision performance for all these different amounts of feedback in one graph. Put this graph in a file called "[PREFIX]_amount_results_rp.pdf". You should do the same for NDCG results which should go in a file called "[PREFIX]_amount_results_ndcg.pdf".
Using 2 top retrieved documents for relevance feedback (m=2), try the following values for BETA: {0.1, 0.5, 1.0}. ALPHA and GAMMA should remain 1.0 (explain why varying these parameters is of no interest to us). This should generate 3 different recall-precision plots. You should manually combine these plots into one gnuplot file, along with recall-precision curve for performance of the original system without pseudo relevance feedback (m=0). Put this graph in a file called "[PREFIX]_beta_results_rp.pdf". You should do the same for NDCG results which should go in a file called "[PREFIX]_beta_results_ndcg.pdf".
In submitting your solution, follow the general course instructions on submitting projects on the course homepage. Generate the zip file in a way that maintains the directory structure required.
Along with that, follow these specific instructions for Project 2. The following files should be submitted separately on Canvas..
Ensure that you can copy these files directly into a fresh copy of the ir project, and see your changes take effect on a CS Linux box. If it won't compile or your changes don't show up, you probably need to include something else. It might seem like a hassle to create so many directories, but it makes things much easier on the grader.
Please make sure that your code compiles and runs on the UTCS lab machines.