This exercise has to do with file compression
using key-word encoding. There are several files associated with this exercise
that are in the same directory.
wordList.cpp A file containing C++ program that
produces a list of the unique words in a file and the number of times each
appears.
words.dat The
output from program WordList with the words sorted by number of occurrences.
history.in A
data file containing 3436 non-blank characters, which was the input to the
program.
Program WordList is case
sensitive; words beginning with an uppercase letter are considered different
from the same word beginning with a lowercase letter.
letters[count] = tolower(letter);
Program WordList ignores words of less than three characters. Would it be
better to ignore words of less than four characters? Recalculate the
compression ratio not encoding words of less than four characters.