As part of my Linux backup scheme (which I need to write up someday) I've recently been swapping and upgrading/replacing some USB hard disks at home. There's a Linux box at home (a Thinkpad T43p running Ubuntu if you must know) that has a 320GB disk attached and mounted as /mnt/backup and was running fairly low on space.

jzawodn@wasp:/mnt$ df -h /mnt/backup
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             276G  211G   51G  81% /mnt/backup

That was after I moved about 50GB of stuff off it last night.

I want to replace it with a newly attached 750GB disk and need to move all the data over to the new disk. But since much of the data consists of remote filesystem snapshots produced using rsnapshot, which makes copious use of hard links, it's rather important that I do this correctly. If I don't, the data won't even fit on the 750GB disk!

(If that seems impossible, you don't quite grok hard links on a filesystem yet.)

Digging deep into my Unix past, I remember needing to do this once before. The trick was not to use any of the usual suspects: cp, tar, rsync, or mv. Instead, you use either dump (yuck) or a combination of find and cpio.

It looks something like this:

mkdir /mnt/backup2/snaps
cd /mnt/backup/snaps
find . -print | cpio -Bpdumv /mnt/backup2/snaps

Then you just wait a long time while stuff scrolls by and you wish you were using disks in eSATA enclosures rather than in USB 2.0 enclosures.

The trouble is that cpio didn't properly preserve timestamps on directories (not sure why--I expected it to), so I had to dig even deeper to remember pairing up dump and restore.

cd /mnt/backup2
mkdir snaps
( dump -0 -f - /mnt/backup/snaps | restore -v -x -y -f - ) >& ~jzawodn/dump.log

And then I waited about half a day for the copy to complete.

root@wasp:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             276G  212G   50G  82% /mnt/backup
/dev/sdc1             688G  284G  370G  44% /mnt/backup2

Not bad. A quick edit to /etc/rsnapshot.conf to change my snapshot_root from /mnt/backup to /mnt/backup2 and that's all it took.

Next time I have to go through this, it won't take me nearly as long to devise a scheme to get it done.

Now, does anyone have alternative methods? Or do you know why cpio didn't preserve timestamps correctly?

Thanks to the folks at TechCzar for translating my tech blog posts and including them in their blog network.

Posted by jzawodn at February 28, 2008 08:40 AM

HiddenNetwork.com Banner

Reader Comments
# Sam said:

On the Mac I use 'ditto' for this. I think that its available on other unix systems as well.

on February 28, 2008 09:15 AM
# Rob Steele said:

I'm playing with BackupPC just now (http://backuppc.sourceforge.net/) and its docs say use dd if you can:


If the pool disk requirements grow you might need to copy the entire data directory to a new (bigger) file system. Hopefully you are lucky enough to avoid this by having the data directory on a RAID file system or LVM that allows the capacity to be grown in place by adding disks.

The backup data directories contain large numbers of hardlinks. If you try to copy the pool the target directory will occupy a lot more space if the hardlinks aren't re-established.

The best way to copy a pool file system, if possible, is by copying the raw device at the block level (eg: using dd). Application level programs that understand hardlinks include the GNU cp program with the -a option and rsync -H. However, the large number of hardlinks in the pool will make the memory usage large and the copy very slow. Don't forget to stop BackupPC while the copy runs.

Starting in 3.0.0 a new script bin/BackupPC_tarPCCopy can be used to assist the copy process. Given one or more pc paths (eg: TOPDIR/pc/HOST or TOPDIR/pc/HOST/nnn), BackupPC_tarPCCopy creates a tar archive with all the hardlinks pointing to ../cpool/.... Any files not hardlinked (eg: backups, LOG etc) are included verbatim.

You will need to specify the -P option to tar when you extract the archive generated by BackupPC_tarPCCopy since the hardlink targets are outside of the directory being extracted.

To copy a complete store (ie: /mnt/data/BackupPC) using BackupPC_tarPCCopy you should:

*

stop BackupPC so that the store is static.
*

copy the cpool, conf and log directory trees using any technique (like cp, rsync or tar) without the need to preserve hardlinks.
*

copy the pc directory using BackupPC_tarPCCopy:

su backuppc
cd NEW_TOPDIR
mkdir pc
cd pc
/usr/local/BackupPC/bin/BackupPC_tarPCCopy /mnt/data/BackupPC/pc | tar xvPf -

on February 28, 2008 09:23 AM
# Joe Beda said:

rsync has a flag (-H) to copy hard links. This is what I used when I had to do something similar. I think it ends up keeping a big map if inodes in memory so you can't stop/restart it, but that doesn't work with the other methods you mentioned. Rsync does go through great lengths to make sure the target is an exact copy of the original.

on February 28, 2008 09:56 AM
# Stuart Langridge said:

What's wrong with rsync --hard-links, which preserves hardlinks?

on February 28, 2008 10:44 AM
# Chris Adams said:

I generally use dd:

mount -o remount,ro /
dd if=/dev/hda of=/dev/newdrive bs=1024k

or even over the network
dd if=/dev/hda of=- bs=1024k | ssh remote_host dd if=- of=/dev/newdrive bs=1024k

with a large blocksize it'll be *massively* faster than filesystem-level copies unless the volume is almost empty and tools like parted have made it fairly easy to expand most common filesystems once you have it copied.

Of course, the mere fact that this is so much work is insane - I'm really looking forward to the day when the process is something like "add new to pool, remove old from pool, wait for the filesystem to copy before removing the old device" process. Sadly even ZFS doesn't do that yet.

on February 28, 2008 10:48 AM
# Jeremy Zawodny said:

I swear that I read the rsync man page three times and never saw the hard links option.

DOH!

on February 28, 2008 11:10 AM
# Cliff Stanford said:

What's wrong with cp -rpv ?

Am I missing something?

Cliff.

on February 28, 2008 12:22 PM
# Jeremy Zawodny said:

Cliff:

My reading of the cp man page didn't say anything about preserving hard links. Symlinks, yes. But not hard links.

on February 28, 2008 12:25 PM
# Roger Binns said:

You need to be very careful when using rsync with the hard links option. I make daily backups using hard links for unchanged files.

On needing to copy disk contents to a new one, I found that rsync uses a humongous amount of memory. In fact I had to drop using a 32 bit Ubuntu for the 64 bit version because it ran out of address space! Then after 3 days with the 64 bit version, I had to change my machine from 1GB of physical RAM to 3GB because rsync was totally thrashing memory and would have taken forever to complete. The final working set size was around 2.5GB.

on February 28, 2008 12:39 PM
# Wayne Scott said:

Using LVM2 is very nice. You add the 750G drive to your pool and then remove your smaller drive.

on February 28, 2008 02:38 PM
# Martin Levy said:

You should check the man pages for find and cpio. Using a simple "find . -depth -print | cpio -pdm $destination" command will both preserve links AND preserve dates on directories. If you use "cp -rp" then you DON'T preserve the directory times because it's not a depth first traversal.

By removing the -v flag from cpio, you will only be presented with the errors. No need for excess crud on the screen. The -B flag is not needed with -p option (it's only used with -i or -o). BTW: Use -C if you are using -i or -o on a modern system.

The -depth option to find and the cpio command showed up externally in AT&T System V Unix (and internally to AT&T much earlier than that!).

The find/cpio combination is still the cleanest way of copying files from one directory to another. The modern usage of ssh, as a way to run a "cpio -i" remotely, hence enabling machine to machine clean copying of a hierarchy works like a charm! Before ssh, we used rsh (I'm glad we aren't doing that anymore).

Enjoy,

Martin

on February 28, 2008 09:20 PM
# Ask Bjørn Hansen said:

As someone else pointed out rsync can do it, but it uses a good deal of memory with a lot of files. (Not impossibly much for a one-off job, but a lot ... I estimated about 1GB memory for ~10M files/links when I was moving my 1TB rsnapshot "disk" from one volume group to another a few days ago).

Anyway - "cp" can do it too, look for the "preserve" option; something like --preserve=all or --preserve=link.

cd /mnt/old
cp -av --preserve=all . /mnt/new

(I sometimes use -v for big jobs like this, sure it might be slower but at least I can easily see what's going on ...)


- ask

on February 28, 2008 10:14 PM
# Asgeir S. Nilsen said:

My laptop Linux installation is based on LVM, and is currently at its third hard drive.

Procedure is fairly simple:

1. Pop new hard drive in drive bay frame and insert where DVD player normally is.

2. If swap is on the LVM, remove it, as the virtual memory subsystem seems to get confused if you migrate underneath it.

3. Add new disk to volume group.

4. Remove old disk from volume group.

5. Carry on working as normal. In case of kernel panics or crashes, LVM will resume the migration where it left off on next reboot.

6. rsync the boot partition to the new disk and do the regular GRUB magic to make the new disk bootable.

7. Insert new drive in hard drive bay and reboot.

As I said, I've done this twice and not lost any data. The same procedure can also be applied when migrating to a new computer, as long as the new computer can accept the hard drive from the old one and manage to boot from it in some way.

Asgeir

on February 29, 2008 12:42 AM
# Jeremy Johnstone said:

I've found the following options to rsync to work well when doing backups of this nature:

-a = handles almost everything
-H = hard links, the option you missed
-S = handle sparse files better (not always needed, but doesn't hurt)
-v = because I like to see what's going on

I really don't know why -a doesn't include -H. It includes virtually everything else even remotely relevant to an "archive" session, so why it doesn't include that almost seems like a mistake to me (mistake in judgment).

on February 29, 2008 09:37 AM
# Totologie said:

Hi (I'm french so my english is not very good ;-p)
I use Backuppc since long time, but I don't know perl language.
I'm using 2.12 version
I have a space problem... I have to copy my Backuppc/data to an other HD (a bigger one)
With this version the BackupPC_tarPCCopy script doesn't work :(
What is the easiest way to do this ?

Thx for your help

on April 16, 2008 01:14 AM
Leave a Comment
Your Name (optional)


Your Email Address (required but won't be displayed on the site)


Your Weblog URL (no weblog? leave it blank)


Type "Jeremy" below (required)


Comment here. Stay on topic (policy). No HTML tags, sorry.


Remember info?



Disclaimer: The opinions expressed here are mine and mine alone. My current or past employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

 

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.