Comparison of sparse vector formats
hashMapVector
+ Compact hashMap representation
+ Amortized constant-time access
- Does not store position information, maybe necessary for future apps
- Will need a lot of modification to Weka
+ Efficient storage, in terms of indices of string values and position
+ Contains position information of tokens
+ Will not require any modification to Weka
- Uses binary search to insert new element to vector
- Would need filters for TF, IDF, token counts, etc.
- Will require a hack to bypass soft-bug during multiple read-writes