October 17, 2003

Spam Does Not Compress Well

For as long as I can remember using procmail, I've been keeping a complete archive of my incoming e-mail that's separate for my working copy. Essentially, what I have is a rule like this at the very top of my ~/.procmailrc file:

    # Backup all mail before processing...

    :0 c
    $HOME/archive/mail/ARCHIVE-`date "+%Y-%m"`

I did that so that I'd always have a copy of my mail in case something went wrong in the filtering process. Every month I'd go thru and compress the monthly archive for safe but compact keeping.

I just compressed the mailbox for September 2003. The original size was 817MB. The compressed size is 447MB. Yes, I'm getting a bit more mail than I used to (thanks, spammers!) but that's barely a 2:1 ratio! I used to see between 8:1 and 10:1.


$ du -sh ARCHIVE-2003-*
41M     ARCHIVE-2003-01.bz2
29M     ARCHIVE-2003-02.bz2
35M     ARCHIVE-2003-03.bz2
35M     ARCHIVE-2003-04.bz2
71M     ARCHIVE-2003-05.bz2
60M     ARCHIVE-2003-06.bz2
63M     ARCHIVE-2003-07.bz2
186M    ARCHIVE-2003-08.bz2
447M    ARCHIVE-2003-09.gz

Ah, yes. Notice the dramatic increase in recent months? I suspect this is largely due to the gibberish that spammers have introduced in their messages to throw off the bayesian filters.

Also, notice that I used gzip this time rather than bzip2. I tried bzip2 but killed it after it wasn't done 90 minutes later. gzip, of course, finished the job in under 20 minutes. No surprise. I've learned this lesson before.

As of 10 minutes ago, I've moved the "keep a copy of every message" procmail rule so that it's run after SpamAssassin and SpamBayes have their chances to weigh in on the likelihood that the message is spam.

Fucking spammers.

Posted by jzawodn at 06:57 PM

October 16, 2003


Okay, Tim Bray has posted one of the more amusing entries I've read in a while: Debbie Does BitTorrent.

So I clicked "Go" and BitTorrent connected to fifteen "peers" and began to slog through Debbie at an average of about 30K/second, and pretty soon began to upload to others at about twice that. Then it did the time estimate and told me it would be done in six hours or so, so I went to bed. When the alarm went off it was just finishing up, and the numbers were consistent, it'd downloaded a gig and uploaded about twice as much to my PornoPeers. Foggy-eyed and pre-coffee, purely in the spirit of experimental verification I clicked on DDD.avi and Quicktime emitted some incomprehensible gibberish about the wrong coder or the wrong version and sorry 'bout that, and shut down.

Heh. PornoPeers.

You learn something new every day.

Posted by jzawodn at 08:17 PM

October 15, 2003

Heading to Asia Soon

On Saturday, I'll board a 777 at San Jose Airport bound for Japan (11 hours, 20 minutes). I'll be in Japan for about a week, mostly visiting Yahoo! Japan. I'll be at the Grand Hayatt Tokyo Hotel in Roppongi. The Y! Japan office is in Roppongi Hills (as is the hotel). Drop by the hotel if you're bored. :-)

I hope to have a bit of down time to check out some local stuff. Any recommendations? Can I bring anyone back some cool electronic toys? (Err, don't answer that.) After Japan, we head to Taipei, Taiwan for about 2 days and then to Seoul, South Korea for a day. We have offices there too. Finally, we'll fly back to the Bay Area, landing in San Francisco.

Yeay! I get to be an international man of... err, ... uhm... hm.


I visited Japan in 1999 but have never been to either Korea or Taiwan. This should be an interesting trip. I'll take pictures. And maybe blog while gone. I'll have some net access, but who knows how much or what I'll do with it.

Thursday is "assemble MySQL presentations for Japan, clean my apartment, and get everything ready" day so that I can use Friday to finish what I don't get done today.

Update: Aww, crap. I'm gonna miss the California International Air Show in Salinas this weekend. Damned Murphy. I'm also missing the chance to meet with some interesting people who will be in town... Murphy sucks.

Update #2: Kick Ass! They've got 802.11 WiFi. :-)

Posted by jzawodn at 10:43 PM

John Hughes Fans Get the Shaft with "32 Candles"

I was psyched when Dan passed me a link to this Yahoo! News story. After all, the title is "Sixteen Candles," 16 Years Later.

It begins:

Pull out the yearbooks, throw on the varsity letter jackets--it's high school reunion time for the gang from the 1984 John Hughes comedy. 32 Candles will update the lives of Sam Baker, Farmer Ted, Long Duk Dong and the rest of the gang.

"Holy Crap!" I thought, "This will be a big hit at the box office."

But then I read on:

The original film was based around Samantha Baker's (Molly Ringwald) 16th birthday, a day that went unobserved by her family. The made-for-television special will pick up the story around Sam's 32nd birthday.

What the fuck?!

I had to re-read that few times to make sure I wasn't seeing things.

A made-for-television special?

What kind of way is that to treat fans of the great 80s John Hughes movies?

I can't believe this.

Please let it be a sick joke.

What next? An after school special remake of The Breakfast Club? A Saturday morning cartoon based on Pretty in Pink?

Posted by jzawodn at 12:44 PM

October 14, 2003

Which Open Source Projects Would You Sponsor?

If you had the chance to direct some money toward a handful of Open Source projects, which would you choose and why?

If your company had the chance to direct some money toward a handful of Open Source projects, which would you suggest it choose and why?

Are the lists different? Why?

Posted by jzawodn at 08:12 PM

MySQL Clustering Upcoming

Yeay, they've finally announced it:

MySQL AB, developer of the world's most popular open source database, today announced that it has acquired Alzato, a venture company started by Ericsson in 2000. Alzato develops and markets NDB Cluster, a high availability data management system designed for the telecom/IP environment.
MySQL will integrate NDB Cluster technology into its product offerings as a high availability clustering data management engine for systems that require maximum uptime and real-time performance, such as telecom and network applications and heavy-load Web sites. MySQL AB will offer NDB Cluster technology as part of a future MySQL database version targeted for next year.

From what I've seen of the technology so far, MySQL will have some kick ass clustering. This is most excellent.

Posted by jzawodn at 07:56 PM

October 13, 2003

Contributing to the Bottom Line

Though I've never actually seen the mythical "bottom line," I've managed to contribute to it in a tangible way this year--or so I'm told. Apparently a project I worked on earlier this year (one I never blogged about) has been tweaked and put into production by the person who inherited it from me. It's actually making money. And now some of our international groups are interested in using it too.

Kick Ass!

It's been a while since I've been able to point at something specific and say "this makes money for us" as opposed to "I help us save money by doing..." To me there's a big difference there.

It's the little things sometimes...

Posted by jzawodn at 05:02 PM

Paying to Send E-Mail

Tim has written up the idea we discussed after dinner on Saturday: Pay to Send.

Granted, this is not a new idea. But the more we talked about it, the more I realized that it's really not rocket surgery. The trick is getting a few decent sized organizations (Hotmail, AOL, Yahoo, Earthlink) to start recognizing the service.

It may not work. The economics could be all off. But hell, it seems like it's worth a shot to me.

Now, how do we get started...?

Update: It's worth pointing out that I know this idea isn't perfect. Viruses that send e-mail via Outlook would end up costing you money. And spammers may resort to stealing credit cards.

Update #2: If you're the type that takes this stuff way too seriously, please stop.

Posted by jzawodn at 09:21 AM

Mark's New Policy

Mark writes to tell us of his new policy:

I have a new life policy: "All other things being equal, avoid empowering lunatics."

Excellent policy, Mark.

Posted by jzawodn at 09:11 AM

October 12, 2003

Novell and Open Source

I was reading Andy Oram's second blog entry about Foo Camp and came across something interesting:

Nat Friedman of Ximian presented his nifty search tool Dashboard, which he had shown at the O'Reilly Open Source conference last July, but which now sports a couple new features like an index for everything on the desktop. He is leaving tomorrow for India, where he will meet with a large number of programmers employed by Novell, the company that bought Ximian recently. He will recruit 30 to 60 of these programmers to work on GNOME and help them learn the social conventions of working in a free software environment.

My first reaction was along the lines of "Damn! I missed the Dashboard presentation." But then I realized, "Holy crap! Novell's throwing a lot of muscle behind Open Source."

It's a shame the O'Reilly blogs don't grok TrackBack.

Posted by jzawodn at 11:04 PM

Foo Camp Wrap-up

I didn't have the time at Foo Camp to blog much about what I was doing, who I was meeting, and what we were discussing. I was too busy and interested to tear myself away. I went to bed each night very tired.

Luckily, a few others provided some on-line notes and such:

I'm sure there will be more as people return home.


The drive to Foo Camp was a pain. I should have known this in advance. I got stuck in traffic on 101 for a while so it took about 3 hours to get there. The return trip was no better. I really need to get that power license to I can fly to stuff like this. On the plus side, I got to watch the Blue Angels flying around San Francisco for Fleet Week.

I arrived unsure of what to expect. I ran into Andy Oram (our editor on the book) at the check-in desk, dropped off my stuff, and headed to the back lawn to find out who was there. I quickly found myself in a sea of interesting people. Chris DiBona brought bread and cheese, others brought wine. Dinner was soon served.

After dinner on Friday night, we were asked to gather upstairs to get things rolling. We did some introductions so that everyone had a [brief] chance to put names with faces. Then they brought in some very large grids (schedules) so that we could start filling in sessions. We had 1-hour time slots on Friday night, Saturday, and Sunday to fill.

Chaos ensued.

It was good chaos.

We managed to self-organize.

The rest of the weekend was an excellent mix of food, interesting people, discussions, impromptu sessions, and hackery. There was music, a natural sound presentation, water bottle rockets, portable showers, and more.

I think that Dori Smith (of Backup Brain) summed it up well:

Most frequently heard comment at Foo Camp: "I have no idea what I'm doing here--everyone here is so much smarter than me."
It's pretty damn cool to be around 200 people who're all thinking that.

I left a bit early (noon) on Sunday to head back, but not before a meeting in which we managed to hammer out some RSS stuff that will be discussed quite soon. More on that later.

I hope I didn't miss much. Was there any sort of closing event?


Here's a partial list of all the people I either got to meet face to face or at least hear speak:

I'm sure I forgot a bunch of others. There were so many that I began jotting down names. I got a few funny looks for pulling out my slip of paper now and then to jot 'em down, but I really don't trust my own memory for stuff like this.

In no particular order:


It's no wonder I was tired every night, huh?

And still, there were a lot of people I did not meet but could have, given more time.


I put some pictures up here. There are some pictures linked fro the Foo Camp Wiki here. Doc's pictures are here.

One of Doc's pictures features me sitting next to Dave Sifry. It was around midnight and we were listening to an excellent presentation about Rendezvous.

Final Word?

Well, okay. Not really. I'll probably have more to say as some of the stuff we discussed at Foo Camp becomes reality.

Posted by jzawodn at 09:04 PM

Some Foo Camp Links

Ben is not sure if he's geek enough to be at Foo Camp.

Derek pisses on Foo Camp.

Tim Bray is here too.

Matt too, blogging about Rendezvous and stuff, including social software, bluetooth security, AMD's Opteron, and more Rendezvous.

The FooCampWiki is back.

So much catching up to do now...

Posted by jzawodn at 06:56 PM