Huffman Compression / Decompression in OCaml

I worked lately on a simple application in OCaml that compresses files using the Huffman compression algorithms. The application is a showcase of how to use the Huffman’s Static compression, decompression algorithms compared to the dynamic (adaptive) algorithms (Vitter algorithm is used in the application).

The source code is available on github here Although the code is not as elegant as it should be, we have used excessively the imperative features of OCaml for optimization reasons, especially for the adaptive algorithm where the tree changes with every introduction of a new symbol to code. So it is not really the best place for OCaml beginners who should be baptized with the functional beauty of the language before they see its evil imperative side :).

Although, It might be a great place too to see how the imperative and functional programming could be done on the same land, I don’t promise anything though, since we did the project in the hurry with just few days in our hands to prepare for exams and … you know the rest of the story.

So the project uses only the primitive OCaml modules, there is no libraries to install or anything, there is also an associated Makefile to compile and test the executable.

The Makefile has a make test directive to test the executable on a file named ”input” in the working directory (that you should include yourself obviously), basically it stars a static compression followed by a decompression then compares the original and the decompressed files to ensure the program works correctly. It does the same with the dynamic method too.

On a side note, we were surprised that while compiling the code with ocamlopt, we get sometimes programs that run 15x times faster than the same program compiled with ocamlc.

For a file about 5.8 MB we got some executing time around

   1: ***@****-Studio-1558:~/workspace/huffman$ make test
   3: Testing... static huffman
   4: Compress:
   5: 1.18 user 
   6: 0.00 system
   7: 0:01.19 elapsed 
   8: 99% CPU (0avgtext+0avgdata 5504maxresident)
  10: Uncompress
  11: 2.26 user 
  12: 0.00 system 
  13: 0:02.27 elapsed 
  14: 99% CPU 
  15: <<<<<<<<<<<<<<<<Compressed = Uncompress ??>>>>>>>>>>>>>>>>
  17: Testing... dynamic huffman
  18: Compress
  19: 8.11 user 
  20: 0.01 system 
  21: 0:08.13 elapsed 
  22: 99%CPU 
  24: Uncompress
  25: 4.04 user 
  26: 0.00 system 
  27: 0:04.05 elapsed 
  28: 99%CPU 
  29: <<<<<<<<<<<<<<<Compressed = Uncompress ??>>>>>>>>>>>>>>>>>

As you can notice, the static version of the program is faster than the dynamic one, this is due to the fact that we optimized the the static version to use actual bits on compression / decompression, but we used lists to represent the bits (0’s and 1’s as actual 32 bit integers!!), so that was expected.

Link for the source code :

Posted in , , . Bookmark the permalink. RSS feed for this post.

comments powered by Disqus

Swedish Greys - a WordPress theme from Nordic Themepark. Converted by