Class Summary |
AffineMetric |
A measure of distance between two strings based on affine distance. |
AffineProbMetric |
AffineProbMetric class implements a probabilistic model string edit distance with affine-cost gaps |
ClassifierInstanceMetric |
ClassifierInstanceMetric class employs a classifier that uses
values returned by various StringMetric's on individual fields
as features and outputs a confidence value that corresponds to
similarity between records |
HashMapVector |
A data structure for a term vector for a document stored
as a HashMap that maps tokens to Weight's that store the
weight of that token in the document. |
InstanceMetric |
Abstract InstanceMetric class for writing metrics that
calculate distance between instances describing database records |
JaccardMetric |
This class claculates similarity between two strings using the Jaccard metric
Some code borrowed from ir.vsr package by Raymond J. |
KernelVSMetric |
This class defines a basic string kernel based on vector space
Some code borrowed from ir.vsr package by Raymond J. |
NGramTokenizer |
This class defines a tokenizer that turns strings into HashMapVectors
of n-grams |
Porter |
The Porter stemmer for reducing words to their base stem form. |
StringMetric |
An abstract class that returns a measure of similarity between strings |
StringReference |
A simple data structure for storing a reference to a document file
that includes information on the length of its document vector. |
SumInstanceMetric |
SumInstanceMetric class simply adds
values returned by StringMetrics on individual fields |
TokenInfo |
A lightweight object for storing information about a token (a.k.a word, term)
in an inverted index. |
Tokenizer |
This abstract class defines a tokenizer that turns strings into HashMapVectors |
TokenOccurrence |
A lightweight object for storing information about an occurrence of a token (a.k.a word, term)
in a Document. |
TokenString |
|
VectorSpaceMetric |
This class uses a vector space to calculate similarity between two strings
Some code borrowed from ir.vsr package by Raymond J. |
Weight |
A simple wrapper data structure for storing a double weight
as an Object that can be put into lists, maps, etc. |
WordTokenizer |
This class defines a tokenizer that turns strings into HashMapVectors
using the native Java StringTokenizer |