weka.deduping.metrics
Class HashMapVector

java.lang.Object
  extended byweka.deduping.metrics.HashMapVector

public class HashMapVector
extends java.lang.Object

A data structure for a term vector for a document stored as a HashMap that maps tokens to Weight's that store the weight of that token in the document. Needed as an efficient, indexed representation of sparse document vectors.


Field Summary
 java.util.HashMap hashMap
          The HashMap that stores the mapping of tokens to Weight's
protected  double m_length
          Store the length of a vector for efficiency
 
Constructor Summary
HashMapVector()
           
 
Method Summary
 void add(HashMapVector vector)
          Destructively add the given vector to the current vector
 void addScaled(HashMapVector vector, double scalingFactor)
          Destructively add a scaled version of the given vector to the current vector
 void clear()
          Clears the vector back to all zeros
 HashMapVector copy()
          Produce a copy of this HashMapVector with a new HashMap and new Weight's
 double cosineTo(HashMapVector otherVector)
          Computes cosine of angle to otherVector.
 double cosineTo(HashMapVector otherVector, double length)
          Computes cosine of angle to otherVector when also given otherVector's Euclidian length (Allows saving computation if length already known.
 double getWeight(java.lang.String token)
          Return the weight of the given token in the vector
 double increment(java.lang.String token)
          Increment the weight for the given token in the vector by 1.
 double increment(java.lang.String token, double amount)
          Increment the weight for the given token in the vector by the given amount.
 double increment(java.lang.String token, int amount)
          Increment the weight for the given token in the vector by the given int
 void initLength()
           
 java.util.Iterator iterator()
          Returns an iterator over the MapEntries in the hashMap
 double length()
          Compute Euclidian length (sqrt of sum of squares) of vector
 double maxWeight()
          Returns the maximum weight of any token in the vector.
 void multiply(double factor)
          Destructively multiply the vector by a constant
 void print()
          Print out the vector showing the tokens and their weights
 int size()
          Returns the number of tokens in the vector.
 void subtract(HashMapVector vector)
          Destructively subtract the given vector from the current vector
 java.lang.String toString()
          Return String of the vector showing the tokens and their weights
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

hashMap

public java.util.HashMap hashMap
The HashMap that stores the mapping of tokens to Weight's


m_length

protected double m_length
Store the length of a vector for efficiency

Constructor Detail

HashMapVector

public HashMapVector()
Method Detail

iterator

public java.util.Iterator iterator()
Returns an iterator over the MapEntries in the hashMap


size

public int size()
Returns the number of tokens in the vector.


clear

public void clear()
Clears the vector back to all zeros


increment

public double increment(java.lang.String token,
                        double amount)
Increment the weight for the given token in the vector by the given amount.


getWeight

public double getWeight(java.lang.String token)
Return the weight of the given token in the vector


increment

public double increment(java.lang.String token)
Increment the weight for the given token in the vector by 1.


increment

public double increment(java.lang.String token,
                        int amount)
Increment the weight for the given token in the vector by the given int


add

public void add(HashMapVector vector)
Destructively add the given vector to the current vector


addScaled

public void addScaled(HashMapVector vector,
                      double scalingFactor)
Destructively add a scaled version of the given vector to the current vector


subtract

public void subtract(HashMapVector vector)
Destructively subtract the given vector from the current vector


multiply

public void multiply(double factor)
Destructively multiply the vector by a constant


copy

public HashMapVector copy()
Produce a copy of this HashMapVector with a new HashMap and new Weight's


maxWeight

public double maxWeight()
Returns the maximum weight of any token in the vector.


print

public void print()
Print out the vector showing the tokens and their weights


toString

public java.lang.String toString()
Return String of the vector showing the tokens and their weights


cosineTo

public double cosineTo(HashMapVector otherVector)
Computes cosine of angle to otherVector.


cosineTo

public double cosineTo(HashMapVector otherVector,
                       double length)
Computes cosine of angle to otherVector when also given otherVector's Euclidian length (Allows saving computation if length already known. more efficient when current vector is shorter than otherVector)


length

public double length()
Compute Euclidian length (sqrt of sum of squares) of vector


initLength

public void initLength()