Compression / Decompression : Comparing Apples to Apples

With pretty much every developer there comes a time when you need to use a compression library; not because you want to ‘zip’ up a massive archive but more because you want to compress some game data on the fly before you transmit it over the network, or to shrink the size of a local cache or something. And, like pretty much every developer, you end up wandering through a maze of open source libraries on the internet, each of which has entirely different API, source code layout, and license agreement. Just figuring out to call ‘compressData’ (if you even can) is often hours of wading through documentation and dealing with build/configuration issues.
For your convenience I am releasing a small code snippet that I wrote when I was trying to evaluate a replacement compression library than the one we are currently using. The simple fact of the matter is that I don’t believe any other programmer should have to waste the same amount of stupid time I did wrestling these libraries into a convenient form.
http://www.amillionpixels.us/test_compression_1.0.exe
Here is the complete API to my wrapper interface:
// Block compression/decompression library wrapper by John W. Ratcliff http://www.codesuppository.blogspot.com/
enum CompressionType
{
CT_INVALID,
CT_CRYPTO_GZIP, // The CryptoPP library implementation of GZIP http://www.cryptopp.com
CT_MINILZO, // The MiniLZO library http://www.oberhumer.com/opensource/lzo/
CT_ZLIB, // The ZLIB library http://www.zlib.net/
CT_BZIP, // The BZIP library http://www.bzip.org/
};
void * compressData(const void *source,int len,int &outlen,CompressionType type=CT_ZLIB);
void * decompressData(const void *source,int clen,int &outlen);
void deleteData(void* mem);
CompressionType getCompressionType(const void *mem,int len);
const char *getCompressionTypeString(CompressionType type);
};
This release supports four open source compression libraries. The CRYPTOPP implementation of GZIP, MINILZO, ZLIB, and BZIP.
This API simply supports block memory compression and decompression. You can compress a block of memory using any one of the four compressors. The compressed memory has a small header on it that indicates which compressor was used, the size of the uncompressed memory block, and a CRC so that it can easily and safely decompressed.
The test application, called ‘test_compression’, loads a roughly 10mb XML file and runs it through each compressor and decompressor and measures the performance characteristcs of each. The results are as follows:
Testing Compression rate and speed with various compressors.
---------------------------------------------------------------
Compress:CT_CRYPTO :FROM: 10,436,335 TO: 2,498,433 23% 1,170 MS
Compress:CT_MINILZO :FROM: 10,436,335 TO: 3,940,072 37% 97 MS
Compress:CT_ZLIB :FROM: 10,436,335 TO: 3,299,771 31% 157 MS
Compress:CT_BZIP :FROM: 10,436,335 TO: 2,270,695 21% 1,544 MS
---------------------------------------------------------------
Testing Decompression speed with various decompressors.
---------------------------------------------------------------
Decompress:CT_CRYPTO :FROM: 2,498,433 TO: 10,436,335 258 MS
Decompress:CT_MINILZO :FROM: 3,940,072 TO: 10,436,335 42 MS
Decompress:CT_ZLIB :FROM: 3,299,771 TO: 10,436,335 69 MS
Decompress:CT_BZIP :FROM: 2,270,695 TO: 10,436,335 390 MS
As expected, MINILZO is the fastest. Unfortunately MINILZO uses a GPL license and cannot be used in commercial products. The rest have licenses which are more flexible. Check the links for each package to see if it is right for you. I did include the ‘CryptoPP’ library for completeness though but, to be frank, I am not very impressed with this package as the compression and decompression rates are very poor. As you would expect BZIP gets the best compression rate but does not perform as quickly.
I am fairly impressed with ZLIB, especially because you can use it in a streaming mode as well, which is excellent for network communications layers. (FYI, the assembly language version of the ZLIB decompressor is hardly any faster than the optimized C code and was not included here.)
You might wonder why I’m even bothering to post this little snippet. Beside the fact that I hope to save other programmers a little bit of time in the future, I encourage any other developers of compression libraries to drop them into this framework so we can stop comparing apples to oranges when it comes to these technologies. A large XML file is a typical use case for the kind of data a game developer might want to squeeze out some space savings for. I am also happy to modify the installer and include more standardized sample data if that is relevant.
I started to add in support for the LZMA library, however they only offer a simple sample decompressor while the compression code is a lot more difficult to extract into a single C style API call.
This demo comes with an easy to use installer and a solution and project file for visual studio 2005. All of the compression code itself is multi-platform and only the little demo app makes any OS specific calls. The libraries are each included as raw source, each located in their own directory. None of the source has been removed (except for test code) so you are not required to use the compression libraries via the wrapper layer.
If anyone feels compelled to add additional compression libraries to this test framework please let me know and I will make a point to include it in a new drop.
Thanks, I hope somebody finds this useful.
John

2 Comments:
At 12:49 AM,
malkia said…
John,
You can use LZO in a commercial product, the license used to be 5000 eu per SKU.
As for LZO alternatives - there a couple - QuickLZ (also GPL/Commercial), LZF, FastLZ, many more...
We used LZO on the PS2 running directly on the IOP (to not waste cycles from the EE), it was decompressing 5-6mb/s (size of the decompressed data), that was enough to decompress and read at the same time, thus improving reading speeds (as it was reading half of the data).
To speed it up, an UNALIGNED DWORD instruction was needed, and indeed the R3000 MIPS had one, I guess the other algorithm could be speed up this way (I mean that's basically what it is, if you don't huffman-ize it, your basic LZ algo is just copying bunch of bytes from one place to another - that copying needs to be made as fast as possible).
Cheers,
malkia
At 10:54 AM,
Anonymous said…
Good on you
Way too many OS projects have this problem of unfriendly, overcomplicated interfaces to something that is, fundamentally, simple.
"Compress this data using BZ2"
Post a Comment
<< Home