SI433 Class 20: more on Huffman Codes

Reading
None.

Homework
To Appear! Look at the program hw.cc, which reads in descriptions of Huffman trees according to the conventions (see below) we devised in class and prints them out in a sort of "infix notation", as per our discussion of Huffman Trees & Expression Trees. Here are some examples:
valiant[188] [~/]> g++ -o hwprog hw.cc
valiant[189] [~/]> hwprog
Enter tree description  : 000000110111000011
In "infix" that would be: ([111]+[000])
valiant[190] [~/]> hwprog
Enter tree description  : 0000001101110000010110010111
In "infix" that would be: ([111]+(([000]+[101])+[010]))
valiant[191] [~/]> 
Modify the program so that it reads in a string of 0's and 1's representing a compressed file and writes out the decompressed file, like this:
valiant[200] [~/]> g++ -o sol sol.cc
valiant[201] [~/]> sol 
valiant[201] [~/]> sol
Enter tree description  : 00000011011100001100100100
In "infix" that would be: ([111]+[000])
Decompressed that would be:

111 111 000 111 111 000 111 111 

valiant[202] [~/]> sol
Enter tree description  : 000000110111000001011001011111001001100010001111001
In "infix" that would be: ([111]+(([000]+[101])+[010]))
Decompressed that would be:

010 111 111 000 010 111 111 111 000 111 010 010 111 111 

valiant[203] [~/]>
Turn In a printout of your source code and a screen capture showing your program running on the following input:
000000110000010010010101111110000111011101001011101010001111
	

Lecture
Homework Review
We went over the homework from the previous lecture.

Implementation issues and algorithm analysis
If we hope to have other people decode a file that we've encoded with Huffman encoding, then we have to find a way to pass along a description of the code we used to encode the file. We make it as easy as possible for the decoder by placing a description of the tree they'll use for decoding at the beginning of the encoded file. But how will we represent the tree?

Th problem is that we need to represent this "complicated" 2-dimensional binary tree structure as a simple 1-dimensional string of characters. Where have we seen this problem before? Expression trees! Remember, each expression tree can be uniquely represented as a postfix expression, and conversely each postfix expression represents a unique expression tree. This gives us a nice ready-made solution for representing binary trees as 1-dimensional strings of characters.

Our Implementation
Our "implementation" of Huffman encoding used the following conventions:
  • The first 8 bits of the file are the block size N in binary.
  • The tree description follows the first 8 bits of the file. It's given as a form of "postfix", where blocks are encoded as a 0 followed by N bits, and 1 signals that the top two subtrees are to be merged. The top of the stack is the right child, value below the top is the left child. Left edges in the tree are labeled 0, right edges are labeled 1.
  • The tree description ends with a 1 that cannot be interpreted as a merge operation, because there is only one operand on the stack.
Here is a completely encoded file:
000000110000010010010101111110000111011101001011101010001111

Asst Prof Christopher Brown
Last modified: Wed Mar 27 16:10:11 EST 2002