Package ir.vsr

Provides basic vector-space information retrieval system.

See:
          Description

Class Summary
Document Docment is an abstract class that provides for tokenization of a document with stop-word removal and an iterator-like interface similar to StringTokenizer.
DocumentIterator An object for iterating over a set of documents in a directory.
DocumentReference A simple data structure for storing a reference to a document file that includes information on the length of its document vector.
Feedback Gets and stores information about relevance feedback from the user and computes an updated query based on original query and retrieved documents that are rated relevant and irrelevant.
FileDocument A Document stored as a file.
HashMapVector A data structure for a term vector for a document stored as a HashMap that maps tokens to Weight's that store the weight of that token in the document.
HTMLFileDocument An HTML file document where HTML commands are removed from the token stream.
InvertedIndex An inverted index for vector-space information retrieval.
Retrieval A lightweight object for storing information about a retrieved Document.
TextFileDocument A normal ASCII text file Document
TextStringDocument A simple document represented by a String
TokenInfo A lightweight object for storing information about a token (a.k.a word, term) in an inverted index.
TokenOccurrence A lightweight object for storing information about an occurrence of a token (a.k.a word, term) in a Document.
 

Package ir.vsr Description

Provides basic vector-space information retrieval system.

For command line interfaces see the main methods of the following classes: