Over the years, I've configured and happily used Linux Software RAID on numerous servers. It has proven to be amazingly resilient and quite stable.
But a couple years ago (2003) when I was building my newest server (which conveniently lives in a collocation facility about 4 miles from where I do), I opted to drop in a 3ware card. They had a good reputation in the Linux world and I figured I might as well move up in the world.
Well guess what died recently?
Right. That server suddenly became unresponsive about two weeks ago.
Due to some access complications, I wasn't able to visit it until this evening. I un-racked the machine, opened it up, and inspected things. All the cables were still plugged in and the card was firmly seated. Hmm.
When I rolled the crash cart over to put a keyboard and monitor on it, I found that the RAID array was simply gone. No trace. I poked around in the 3ware BIOS a bit and couldn't figure out what was going on.
I brought the machine home and decided to chuck the card. It'd failed in its single mission: keeping a redundant copy of my data on both disks. I plugged the two disks directly into the motherboard and stuck in my little Debian installation USB stick (just made it tonight). It's easier than finding a CD-ROM drive I can plug in.
Part way through the configuration process, I noticed the primary drive acting very bursty. Then I heard the clicking noises. We all know what it means when a hard disk start to click, right?
Now it was all making sense. One of the two drives flaked out and that caused the RAID controller to shit itself and blow away the array.
Let's just say that I'll be going back to Software RAID from now on. The machine is rebuilt (minus the bad disk) and I'll put it back in the rack tomorrow morning.
Thanks to rsnapshot, I never lost any data. I had current off-site backups. In two locations. Doesn't everyone?
Let's just say I've been burned a few times in the past.
Anyway, soon I can finally migrate the data for this site and several others off my old (going on 6 years old) server in Ohio (happily running Software RAID).
In retrospect, I was adding complexity and a new point of failure to a system that had always worked fine in the past. I've learned my lesson.
Posted by jzawodn at March 08, 2007 10:01 PM