For a short article in the June issue of Linux Magazine, I needed to compare the relative performance of gzip, bzip2, and rzip.
I used a 180MB mbox file, consisting of my non-spam e-mail from last month. (I know, it's only one test and doesn't represent how the tools will work on other data sets.)
command cpu time new size ------- -------- -------- gzip 17.63 sec 87 MB gzip -9 23.26 sec 87 MB bzip2 -9 114.90 sec 76 MB rzip 57.00 sec 52 MB
It's interesting to note that gzip's default (-6) and "try hardest" (-9) resulted in less than a 1MB difference.
I won't spoil the article by telling you the rest of the story here (why you may or may not want to use a particular tool), but I thought the numbers alone were pretty interesting.
See Also
Posted by jzawodn at April 07, 2004 02:22 PM
What do you mean by normal?
bzip2 -9 *is* normal. Check the docs.
How do bzip2 vs rzip compare on other kinds of data which has less long-rage redundancy than mboxen?
That page doesn't offer any data on rzip though — which is really the object of interest here.
Jeremy, I know you are specificly looking at Linux based tools but it may be interesting to compare what 7-zip does on your corpus. 7-zip is open source though it is Windows only at this point, there may be a Linux port in the future
A good site to check for compression stuff is here
Wow, nice. I ran it on a WAV file to see how that worked:
Original Size: 28,118,268
GZip - 12.45s, 24,687,083
GZip9 - 13.09s, 24,687,084 (heh)
BZip - 33.69s, 17,728,172
RZip - 43.97s, 17,855,402
Shorten - 6.07s, 15,578,013
Flac - 10.18s, 13,557,698
Looks like RZip is fairly comparable to BZip for "randomish" data, though a bit slower. Still, seems like a good general-purpose tool. I added Shorten and Flac for comparison, which are two specialized lossless WAV file compressors. They beat the pants off of everyone, but that's expected with their specialization.
The reason I prefer both gzip and bzip2 is that they have been around for a while and are well tested and will be around to stay. I worry that in five years time I'll need to un-rzip something and find that rzip doesn't exist, won't run on my architecture, or the files are corrupted. These are my priorities when it comes to backups. On the flip side, rzip does look promising.
Gavin, rzip is released under the GNU GPL. That means that the current incarnation at least will always be open for you to compile it from source on any platform you choose. So in response to your first two points, rzip will always exist and you can easily port it to your architecture.
The third point, about your archives being corrupted - that's possible with any tool. Use a second tool to generate parity archives of your backup if you're that worried about corruption. That way you can restore a corrupted rzip/rar/whatever file using the PARs.
Hi. i am very interested in using in an Open Source project I'm doing. Unfortunately, rzip is GPL'ed, and I'm using the Berkeley License so that people can incorporate my code in non open-source projects. So even though rzip is better, I can't use it.
rzip-2.0 has bug. have here 250MB buffer which can't be decompressed(at around 100MB) after it's compressed in levels -0 and -1
bug is in rzip 2.0 and also in debian's stable rzip-2.0-2
asd: mind linking us to a more detailed page describing this bug? I don't see it reported anywhere on the net or on Debian's bug reporting page.
Those are some pretty compelling results for rzip.
I was under the impression bzip2 and rz archives were generally about even, but after downloading rzip from MacPorts/DarwinPorts and giving it a shot there was definitely noticable difference.
A brief look at the man page for it shows you can add the -P flag which gives you a nice percentage progress indicator which is especially useful when you're compressing the heck out of a multi-gigabyte tar archive.
I know it's several years late in coming, but thanks for the heads up :)
The real secret behind this post is that rzip cannot be piped, while gzip and bzip2 can.