As noted in Disk Goes Boom, one of my colocated machines had a nasty disk failure. The disks arrived today. I hope to figure out how bad the damage is, replace the bad disk, and ship them back to get installed.
In the meantime, I've done something that I should have done a year ago. I installed the smartsuite package on my two remaining machines. It comes wtih a command-line tool named smartctl that provides various options for poking and prodding at your SMART aware disks. (You can read more about S.M.A.R.T. technology here.) It also comes with a daemon that keeps an eye on the health of your disks and puts messages in syslog to let you know what's up with them.
Now all I need to do is figure out which messages to watch for in syslog. Once I do, I'll setup a cron job to alert me if any problems show up.
Posted by jzawodn at February 07, 2003 10:38 PM
Hey, linking to Active Smart - that's interesting. Their software is good, but it's running on MS Operating System and it is not free. Tested the trial version some weeks ago because of a harddisk failure reported by BIOS. Newer BIOSes have a SMART option, but it becomes bothering when alarm is switched on.
Ha, you too ;)
I haven't had the failure yet, but I've been getting some scary clicking noises. smartctl rocks -- it told me not to worry too much, and make sure I make some backups. I'm paraphrasing, but basically it indicated minor issues and nothing serious....
You might want to have a look at a more recent version of smartctl and smartd:
http://smartmontools.sourceforge.net/
Among others these let you run self-tests on the disk and monitor the results.
Debian versions are available -- see the URL above for links.
Instead of monitoring the log with cron, why not just (from default)
edit
/etc/default/smartmontools
start_smartd=yes <--uncomment this only
then edit
/dev/smartd.conf
DEVICESCAN <--comment out
/dev/hda -H -m admin@example.com <-- add this line
then;
/etc/init.d/smartmontools restart
That way it silently checks the drives until something gives error.... If the drive has an error, you get an email about it.
ooops, your blog busted with certian chars.
from default debian packages
edit /etc/default/smartmontools
uncomment start_smartd=yes
edit /dev/smartd.conf
comment out DEVICESCAN
add a line /dev/hda -H -m admin@example.com
That way it silently checks the drives until something gives error.... If the drive has an error, you get an email about it.