Skip to content

SebaRenner/huffmann-algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman Text Compression Algorithm

Huffman coding is a lossless text compression algorithm. It substitutes each character in a text with a binary code. The higher the frequency of a character in the text, the lower the code.

It's mathematically proven to be the optimal algorithm of its kind*

* Symbol-by-symbol coding with a known input probability distribution

Algorithm

Encoding

  1. Count the frequency of each character in the input text.
  2. Combine the characters with the smallest frequency and sum up their frequency. Repeat until you only have 1 character left.
  3. For each character traverse the tree down from the root. If the character is contained in the left child add a "0", if he's contained in the right child add a "1". Traverse until you've reached the original character node. The path from the root represents the huffman code of this character.
  4. Substitute each character in the input text with its huffman code representation.

Decoding

A huffman encoded text can be decoded in two different ways. Either way requires the substitution table/huffman tree that was used for encoding the text.

Decoding using Substitution Table

  1. Start from the beginning of the encoded text with an empty partial code
  2. Add bits to the partial code until there is a matching key in the substitution table.
  3. Save the keys character value and remove the partial code from the encoded text.
  4. Repeat until no bits are left in the encoded text.

Decoding using Tree

  1. Traverse the tree bit by bit until you land on a leaf node.
  2. Save the leaves character value and remove the traversed path from the encoded text.
  3. Repeat until no bits are left in the encoded text.

About

C# implementation of the Huffman text compression and decompression algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages