Package ir.vsr

Class Summary
Document Docment is an abstract class that provides for tokenization of a document with stop-word removal and an iterator-like interface similar to StringTokenizer.
DocumentIterator An object for iterating over a set of documents in a directory.
DocumentReference A simple data structure for storing a reference to a document file that includes information on the length of its document vector.
Feedback Gets and stores information about relevance feedback from the user and computes an updated query based on original query and retrieved documents that are rated relevant and irrelevant.
FileDocument A Document stored as a file.
HashMapPosVector A data structure for a "positional" term vector for a document stored as a HashMap that maps tokens to ArrayList's of Integer's which are the positions of the token in the document.
HashMapVector A data structure for a term vector for a document stored as a HashMap that maps tokens to Weight's that store the weight of that token in the document.
HTMLFileDocument An HTML file document where HTML commands are removed from the token stream.
InvertedIndex An inverted index for vector-space information retrieval.
InvertedPosIndex An inverted index for vector-space information retrieval.
Retrieval A lightweight object for storing information about a retrieved Document.
RetrievalPosInfo A lightweight object for storing information about a retrieved Document for a positional inverted index that includes vector-space and proximity
TextFileDocument A normal ASCII text file Document
TextStringDocument A simple document represented by a String
TokenInfo A lightweight object for storing information about a token (a.k.a word, term) in an inverted index.
TokenOccurrence A lightweight object for storing information about an occurrence of a token (a.k.a word, term) in a Document.
TokenPositionInfo A lightweight object for storing information about positions of a token (a.k.a word, term) in some document.
TokenPosOccurrence A lightweight object for storing information about an occurrence of a token (a.k.a word, term) in a Document, including an array of the positions at which it occurs.