Comparison of sparse vector formats

hashMapVector
- Does not store position information, maybe necessary for future apps
- Will need a lot of modification to Weka

SparseInstance

+ Efficient storage, in terms of indices of string values and position

+ Contains position information of tokens

+ Will not require any modification to Weka

Uses binary search to insert new element to vector
Would need filters for TF, IDF, token counts, etc.
Will require a hack to bypass soft-bug during multiple read-writes

Previous slide Next slide Back to first slide View graphic version