Project 0 for CS 371R:
Information
Retrieval and
Web Search
Due: September 11, 2024 at 11:59 p.m.
Project 0 will NOT be graded, and is entirely optional. It is just
an exercise to walk you through the trace-collection and submission
procedure. The sole purpose of Project 0 is to smooth out any glitches
with Java environment setting, trace-collecting or submitting. If you
submit Project 0 and have any problems on the way, the TA can help you with it.
The TA will help people with problems regarding Project 0 until the
submission deadline. After the submission deadline for Project 0, it will be
your responsibility to make sure that you know how to:
- Set up the Java environment correctly
- Collect traces
- Submit the required files
Project 0
As discussed in class, a basic system for vector-space retrieval (VSR) is
available in /u/mooney/ir-code/ir/vsr/
. See the Javadoc for this system. Use the main
method for
InvertedIndex to index a set of documents and then process queries.
You can use the web pages in
/u/mooney/ir-code/corpora/curlie-science/
as a set of test
documents. This corpus contains 900 pages, 300 random samples each from the
Curlie indices
for biology,
physics, and chemistry.
See the sample trace of using the system.
Open a Firefox browser before you run the code in order to have
selected documents displayed in the browser.
Your task
- Setup your Java environment (see info at http://www.cs.utexas.edu/users/mooney/ir-course/java-info.html)
- Collect the trace using the "script" Unix utility. (see Project Submission Info)
- Index the document collection at /u/mooney/ir-code/corpora/curlie-science/ and
give the queries "cold fusion" and "quantum mechanics".
Submission Instructions
Follow the general instructions for submitting files using Gradescope
as described in Project Submission Info.
For this assignment, you need to submit the following files:
-
InvertedIndex.java
and InvertedIndex.class
(*.java and *.class file)
-
trace/curlie.txt
The trace file under the 'trace' folder. If you change this name, the trace match test will fail
-
report.pdf
A PDF report file with a mere "Hello World!" text.
Zipping Guidelines
- Usually, you will directly zip all the submission files, including codes, pdf and trace dir.
- NOTE: The zip file should NOT have an upper level directory. For example, the zip file
should NOT have extra folders like
jd1234_myproj0/InvertedIndex.java
but rather
the files should simply be in the root folder directly InvertedIndex.java
,
trace/curlie.txt
, and so on.
For example, for project 0, this is how Gradescope should look after submission:
Autograder
- After submitting the code on Gradescope, you will be able to run the autogrdaer on it and verify your submission.
You should immediately be able to see if your code compiled correctly and if certain basic checks such as trace generation
and output format are correct. You can also see the results of the autograder run on your code for sample test cases.
Note that many test cases will be hidden and you will receive the score only after grading.
- Here is how the results after autograding might look like:
If you prefer to work in Eclipse, there is also a brief guide on creating an Eclipse project from the class code.