For a short article in the June issue of Linux Magazine, I needed to compare the relative performance of gzip, bzip2, and rzip.

I used a 180MB mbox file, consisting of my non-spam e-mail from last month. (I know, it's only one test and doesn't represent how the tools will work on other data sets.)

  command       cpu time      new size
  -------       --------      --------
  gzip          17.63 sec     87 MB
  gzip -9       23.26 sec     87 MB
  bzip2 -9     114.90 sec     76 MB
  rzip          57.00 sec     52 MB

It's interesting to note that gzip's default (-6) and "try hardest" (-9) resulted in less than a 1MB difference.

I won't spoil the article by telling you the rest of the story here (why you may or may not want to use a particular tool), but I thought the numbers alone were pretty interesting.

See Also

Posted by jzawodn at April 07, 2004 02:22 PM

Reader Comments
# Ian Holsman said:

what is the result of doing a normal 'bzip2' ?

on April 7, 2004 02:44 PM
# Erich Kolb said:

What about .zip?

on April 7, 2004 02:46 PM
# Jeremy Zawodny said:

What do you mean by normal?

bzip2 -9 *is* normal. Check the docs.

on April 7, 2004 02:46 PM
# Seun Osewa said:

Rzip makes bzip2 look _really_ bad here!

on April 7, 2004 02:59 PM
# Aristotle Pagaltzis said:

How do bzip2 vs rzip compare on other kinds of data which has less long-rage redundancy than mboxen?

on April 7, 2004 03:12 PM
# Aristotle Pagaltzis said:

Err, long-range of course. :)

on April 7, 2004 03:12 PM
# justin said:
on April 7, 2004 03:48 PM
# Aristotle Pagaltzis said:

That page doesn't offer any data on rzip though — which is really the object of interest here.

on April 7, 2004 04:45 PM
# Yusuf said:

Jeremy, I know you are specificly looking at Linux based tools but it may be interesting to compare what 7-zip does on your corpus. 7-zip is open source though it is Windows only at this point, there may be a Linux port in the future

A good site to check for compression stuff is here

on April 7, 2004 07:49 PM
# Josh Woodward said:

Wow, nice. I ran it on a WAV file to see how that worked:

Original Size: 28,118,268

GZip - 12.45s, 24,687,083
GZip9 - 13.09s, 24,687,084 (heh)
BZip - 33.69s, 17,728,172
RZip - 43.97s, 17,855,402
Shorten - 6.07s, 15,578,013
Flac - 10.18s, 13,557,698

Looks like RZip is fairly comparable to BZip for "randomish" data, though a bit slower. Still, seems like a good general-purpose tool. I added Shorten and Flac for comparison, which are two specialized lossless WAV file compressors. They beat the pants off of everyone, but that's expected with their specialization.

on April 8, 2004 09:28 AM
# Gavin said:

The reason I prefer both gzip and bzip2 is that they have been around for a while and are well tested and will be around to stay. I worry that in five years time I'll need to un-rzip something and find that rzip doesn't exist, won't run on my architecture, or the files are corrupted. These are my priorities when it comes to backups. On the flip side, rzip does look promising.

on April 8, 2004 11:55 AM
# Luke Reeves said:

Gavin, rzip is released under the GNU GPL. That means that the current incarnation at least will always be open for you to compile it from source on any platform you choose. So in response to your first two points, rzip will always exist and you can easily port it to your architecture.

The third point, about your archives being corrupted - that's possible with any tool. Use a second tool to generate parity archives of your backup if you're that worried about corruption. That way you can restore a corrupted rzip/rar/whatever file using the PARs.

on April 11, 2004 08:18 PM
# Simson Garfinkel said:

Hi. i am very interested in using in an Open Source project I'm doing. Unfortunately, rzip is GPL'ed, and I'm using the Berkeley License so that people can incorporate my code in non open-source projects. So even though rzip is better, I can't use it.

on November 18, 2005 01:09 PM
# asd said:

rzip-2.0 has bug. have here 250MB buffer which can't be decompressed(at around 100MB) after it's compressed in levels -0 and -1

bug is in rzip 2.0 and also in debian's stable rzip-2.0-2

on December 6, 2005 04:50 AM
# Deffexor said:

asd: mind linking us to a more detailed page describing this bug? I don't see it reported anywhere on the net or on Debian's bug reporting page.

on December 10, 2005 09:30 AM
# Ruben Schade said:

Those are some pretty compelling results for rzip.

I was under the impression bzip2 and rz archives were generally about even, but after downloading rzip from MacPorts/DarwinPorts and giving it a shot there was definitely noticable difference.

A brief look at the man page for it shows you can add the -P flag which gives you a nice percentage progress indicator which is especially useful when you're compressing the heck out of a multi-gigabyte tar archive.

I know it's several years late in coming, but thanks for the heads up :)

on September 6, 2007 11:26 PM
# Evan Brewer said:

The real secret behind this post is that rzip cannot be piped, while gzip and bzip2 can.

on August 18, 2009 11:58 PM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

 

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.