For as long as I can remember using procmail, I've been keeping a complete archive of my incoming e-mail that's separate for my working copy. Essentially, what I have is a rule like this at the very top of my ~/.procmailrc file:
# Backup all mail before processing... :0 c $HOME/archive/mail/ARCHIVE-`date "+%Y-%m"`
I did that so that I'd always have a copy of my mail in case something went wrong in the filtering process. Every month I'd go thru and compress the monthly archive for safe but compact keeping.
I just compressed the mailbox for September 2003. The original size was 817MB. The compressed size is 447MB. Yes, I'm getting a bit more mail than I used to (thanks, spammers!) but that's barely a 2:1 ratio! I used to see between 8:1 and 10:1.
$ du -sh ARCHIVE-2003-* 41M ARCHIVE-2003-01.bz2 29M ARCHIVE-2003-02.bz2 35M ARCHIVE-2003-03.bz2 35M ARCHIVE-2003-04.bz2 71M ARCHIVE-2003-05.bz2 60M ARCHIVE-2003-06.bz2 63M ARCHIVE-2003-07.bz2 186M ARCHIVE-2003-08.bz2 447M ARCHIVE-2003-09.gz
Ah, yes. Notice the dramatic increase in recent months? I suspect this is largely due to the gibberish that spammers have introduced in their messages to throw off the bayesian filters.
Also, notice that I used gzip this time rather than bzip2. I tried bzip2 but killed it after it wasn't done 90 minutes later. gzip, of course, finished the job in under 20 minutes. No surprise. I've learned this lesson before.
As of 10 minutes ago, I've moved the "keep a copy of every message" procmail rule so that it's run after SpamAssassin and SpamBayes have their chances to weigh in on the likelihood that the message is spam.
Posted by jzawodn at October 17, 2003 06:57 PM