In other words, I have a data directory and a data.previous directory. I would like to make a backup of the stuff in data.previous, most of the files being unchanged from data. And I'd like to do this without using lots of disk space.
The funny thing is that gzip is weird about hard links. If you try to gzip a file whose link count is greater than one, it complains.
I was puzzled by this and started to wonder if it actually over-writes the original input file instead of simply unlinking it when it is done reading it and generating the compressed version.
So I did a little experiment.
First I create a file with two links to it.
/tmp/gz$ touch a /tmp/gz$ ln a b
Then I check to ensure they have the same inode.
/tmp/gz$ ls -li a b 5152839 -rw-r--r-- 2 jzawodn jzawodn 0 2008-12-03 15:38 a 5152839 -rw-r--r-- 2 jzawodn jzawodn 0 2008-12-03 15:38 b
They do. So I compress one of them.
/tmp/gz$ gzip a gzip: a has 1 other link -- unchanged
And witness the complaint. The gzip man page says I can force it with the "-f" argument, so I do.
/tmp/gz$ gzip -f a
And, as I'd expect, the new file doesn't replaced the old file. It gets a new inode instead.
/tmp/gz$ ls -li a.gz b 5152840 -rw-r--r-- 1 jzawodn jzawodn 22 2008-12-03 15:38 a.gz 5152839 -rw-r--r-- 1 jzawodn jzawodn 0 2008-12-03 15:38 b
This leads me to believe that the gzip error/warning message is really trying to say something like:
gzip: a has 1 other link and compressing it will save no space
But I still don't see the danger. What can't that simply be an informational message? After all, you still need enough space to store the original and compressed versions since the original (in the normal case) exists until it is done writing the compressed version anyway. (I checked the source code later.)
So what's the rationale here? I don't get it.
Posted by jzawodn at December 03, 2008 03:51 PM