I worked lately on a simple application in OCaml that compresses files using the Huffman compression algorithms. The application is a showcase of how to use the Huffman’s Static compression, decompression algorithms compared to the dynamic (adaptive) algorithms (Vitter algorithm is used in the application).
The source code is available on github here https://github.com/martani/Huffman-compression--OCaml-. Although the code is not as elegant as it should be, we have used excessively the imperative features of OCaml for optimization reasons, especially for the adaptive algorithm where the tree changes with every introduction of a new symbol to code. So it is not really the best place for OCaml beginners who should be baptized with the functional beauty of the language before they see its evil imperative side :).
Although, It might be a great place too to see how the imperative and functional programming could be done on the same land, I don’t promise anything though, since we did the project in the hurry with just few days in our hands to prepare for exams and … you know the rest of the story.
So the project uses only the primitive OCaml modules, there is no libraries to install or anything, there is also an associated Makefile to compile and test the executable.
The Makefile has a make test directive to test the executable on a file named ”input” in the working directory (that you should include yourself obviously), basically it stars a static compression followed by a decompression then compares the original and the decompressed files to ensure the program works correctly. It does the same with the dynamic method too.
On a side note, we were surprised that while compiling the code with ocamlopt, we get sometimes programs that run 15x times faster than the same program compiled with ocamlc.
For a file about 5.8 MB we got some executing time around
1: ***@****-Studio-1558:~/workspace/huffman$ make test
2:
3: Testing... static huffman
4: Compress:
5: 1.18 user
6: 0.00 system
7: 0:01.19 elapsed
8: 99% CPU (0avgtext+0avgdata 5504maxresident)
9:
10: Uncompress
11: 2.26 user
12: 0.00 system
13: 0:02.27 elapsed
14: 99% CPU
15: <<<<<<<<<<<<<<<<Compressed = Uncompress ??>>>>>>>>>>>>>>>>
16:
17: Testing... dynamic huffman
18: Compress
19: 8.11 user
20: 0.01 system
21: 0:08.13 elapsed
22: 99%CPU
23:
24: Uncompress
25: 4.04 user
26: 0.00 system
27: 0:04.05 elapsed
28: 99%CPU
29: <<<<<<<<<<<<<<<Compressed = Uncompress ??>>>>>>>>>>>>>>>>>
As you can notice, the static version of the program is faster than the dynamic one, this is due to the fact that we optimized the the static version to use actual bits on compression / decompression, but we used lists to represent the bits (0’s and 1’s as actual 32 bit integers!!), so that was expected.
Link for the source code : https://github.com/martani/Huffman-compression--OCaml-