Making Weka Text-friendly

Preprocess text by making wrapper calls to:
- Mooney’s IR package: Tokenize, Porter Stemming, TFIDF
- McCallum’s BOW package: Tokenize, Stem, TFIDF, Information-theoretic pruning, N-gram tokens, different smoothing algorithms
- Fan’s MC toolkit: Tokenize, TFIDF, pruning, CCS format

No inverted index in Weka: OK if not doing IR, but KNN is inefficient
- May want to integrate VSR package of IR with Weka

Probability underflow currently: have to do calculations with logs
- NaiveBayes, KNN, etc: Can have 2 versions of each (sparse, dense)

Preprocess text by making wrapper calls to: