CS 1723 Section 1T, Summer 1998 Assignment 7: Huffman Coding
Write a program to decode a file encoded with the program
huffman.c.
huffman.c reads a text file
named on the command line and counts all occurences of each of the
255 possible byte combinations in the file in an array called freqs[256].
For example, if there are 14 letter 'A's in the file, then freqs['A'] == 14.
The program also counts the number of characters in the file, calling this
quantity n. It then builds a Huffman tree from the frequency
data. An output file is opened with extension .huf. Before coding the file, the program writes the freq array of
256 integers (1024 bytes on the Sun) and the integer n
(four bytes on the Sun) to the output file, using C statements equivalent to:
fwrite (freqs, 256, sizeof (int), outputfile);
fwrite (&n, 1, sizeof (int), outputfile);
Then the input file is read again and Huffman coded, and the resulting bits
are written to the output file.
Your program should read a Huffman coded input file named on the command line
(just like huffman.c), then decode the file, writing it
to standard output (i.e. not another file opened with fopen()).
You may borrow any code you wish from huffman.c.
Your program should do the following:
- Open the input file for reading with fopen().
- Read the freqs[] array and n using fread().
- Build a Huffman tree for the frequencies, just like in huffman.c.
- Start at the root of the tree, and begin reading bits.
If the bit is 1, go
right, otherwise go left. When you get to a leaf, write the character
contained in that leaf to standard output and begin at the root again.
- Close the input file.
- Your program should not print anything to standard output except
for the decompressed file. You may print diagnostic output (such as
an error message explaining that a file can't be opened) to standard
error.
Test your program by compressing files with huffman.c and
decompressing them with your program.
To turn in your program:
- Make sure it compiles and runs on runner.
- There are three files on runner in ~djimenez/cs1723/huffman:
test1.huf, test2.huf and test3.huf.
For each file, decompress it with your program and describe the
contents in the comments of the first few lines of your programs.
test3.huf is a compressed text file. For this file, write
in your comments the last proper noun in the file.
- Email your program code (containing the above comments) to the
instructor. If your code spans multiple files, provide a Makefile and
send all files as uuencoded tar file in a single email message.
Note: huffman.c and your program work with data in
binary format. This means that you may have to do something system
dependent to get them to work on a non-Unix operating system (e.g.,
use fopen (..., "rb") instead of "r"). Also,
note that the SPARC is a "big-endian" system; integers are stored
with the most significant byte in the first position. You can still
use huffman.c on a "little-endian" system, where integers
are stored least significant byte first, but compressed files will not
be portable to big-endian systems. So you should do your final testing
and decompression of the test files on a big-endian machine like runner.
The Motorola 68000 series and PowerPC processors (i.e., Macintosh computers)
and MIPS-based Silicon Graphics computers are big-endian
and thus compatible with runner. Intel processors (e.g. Pentium)
are little-endian and thus not compatible.
This program is due by midnight, Monday, August 3, 1998.