December 30, 2008

Twitter as a Dynamic DNS Service

I occasionally wish to know the IP address of my home Cable Modem or DSL connection but don't really care if it's available in DNS or not. It occurred to me that if I could programmatically detect the IP change, I'd be able to notify myself via Twitter.

At first, I wanted a simple web service that'd tell me my IP address--something like WhatIsMyIP.com but an API suitable for simple scripting.

Not finding anything, I created this massive PHP script instead and hosted it on my server:

That made it easy to write a simple bash shell script that can be run from cron every few minutes. It uses curl to hit that script and compares the result with the previous result (stored in ~/.last_ip). If they differ it updates the file and tells twitter, again using curl.

Of course, I had to create that new twitter account and then follow it in my main account. But, hey, that wasn't so hard. Now I have a Web 2.0ish social dynamic DNS thingy that uses Twitter.

Aren't I cool and buzzword compliant?!

Posted by jzawodn at 08:42 AM

December 19, 2008

Talk Announcement: MySQL and Search at Craigslist

I recently learned that my talk has been accepted for the 2009 MySQL Conference in Santa Clara, California. It is currently scheduled for Tuesday the 21st and titled MySQL and Search at Craigslist.

Here's the abstract (which I've promised to expand upon soon):

Millions of people search for things every day on craigslist: tickets, cars, garage sales, jobs, events, and so on.
This talk will look at the recent evolution of database and search architecture at Craigslist, including performance, caching, partitioning, and other tweaks. We'll pay special attention to the unique challenges of doing this for a large data set that has an especially high churn rate (new posts, edits, and deletes).
And we strive to do this using as little hardware and power as possible.

If you're coming to the conference, drop by and harass me. :-)

If you're not sure check out the full schedule--there's a lot of good stuff packed into the conference already and a lot of talks are still not even posted.

Posted by jzawodn at 07:54 AM

December 08, 2008

The New MySQL Landscape

Interesting things are afoot in the MySQL world. You see, it used to be that the MySQL world consisted of about 20-40 employees of MySQL AB (this funny distributed Swedish company that built and supported the open source MySQL database server), a tiny handful of MySQL mailing lists, and large databases were counted in gigabytes not terabytes. A Pentium III was still a decent server. Replication was a new feature!

Hey, anyone remember the Gemini storage engine? :-)

How times have changed...

Nowadays MySQL is sort of a universe onto itself. There are multiple storage engines (though MyISAM and InnoDB are still the popular ones), version 5.1 is out (finally), and the whole company made it over 400 employees before it was gobbled up by Sun Microsystems (a smart move, IMHO, though history will judge that) a while back.

If I had to guess 5 years or so ago what would be interesting to me today about MySQL, I'd have been really, really wrong. The future rarely turns out like we think. Just ask Hillary Clinton.

Here's a little of what's rattling around in the MySQL part of my little brain these days...

Outside Support, Patches, and Forks

The single most interesting and surprising thing to me is both the number and necessity of third-party patches for enhancing various aspects of MySQL and InnoDB. Companies like Percona, Google, Proven Scaling, Prime Base Technologies, and Open Query are all doing so in one way or another.

On the one hand, it's excellent validation of the Open Source model. Thanks to reasonable licensing, companies other than Sun/MySQL are able to enhance and fix the software and give their changes back to the world.

Some organizations are providing just patches. Others, like Percona are providing their own binaries--effectively forks of MySQL/InnoDB. Taking things a step further, the OurDelta project aims to aggregate these third party patches and provide source and binaries for various platforms. In essences, you can get a "better" MySQL than the one Sun/MySQL gives you today. For free.

Meanwhile, development on InnoDB continues. Oh, did I mention the part where they were bought by Oracle (yes, *that* Oracle) a while back? Crazy shit, I tell you. But it makes sense if you squint right.

Anyway, the vibe I'm getting is that folks are frustrated because there's not a lot of communication coming out of the InnoDB development team these days. I can't personally verify that. It's been years since I corresponded with Heikki Tuuri (the creator of InnoDB). So folks like Mark Callaghan of Google have been busy analyzing and patching it to scale better for their needs.

And we all benefit.

Drizzle

Taking things a step further yet, the Drizzle project is a re-making of MySQL started primarily by Brian Aker, who worked as MySQL's Director of Architecture for years. Brian is now at Sun and, along with a handful of others at Sun and elsewhere, is ripping out a lot of the stuff in a fork of MySQL that doesn't get used much, needlessly complicated the code, or is simply no longer needed.

In essence, they're taking a hard look at MySQL and asking what it really needs to provide for a lot of it's uses today: Web and "cloud" stuff. He visited us at Craigslist a few months ago to talk about the project a bit and get our input and feedback. I believe it was that day I joined one of the mailing list and started following what's going on. Heck, I even build Drizzle on an Atom-powered MSI Wind PC regularly.

It's great to see a re-think of MySQL going on... keeping the good, getting rid of the bad, and modularizing the stuff that people often want to do differently (authentication, for example).

It's even better to see the group that's hacking on it. They really have their heads on straight.

Unanswered Questions

Why is all this even necessary? Are the "enterprise" customers and their demands taking focus away from what used to be the core use and users of MySQL? Is Sun hard to work with?

It's clear that both the MySQL and InnoDB teams could be doing more to help. But having worked at a large company for long enough, I realize that things are rarely as simple as they should be.

Will this stuff get integrated back into mainline MySQL? Will Linux distributions like Ubuntu, Debian, and Red Hat pick up OurDelta builds? What about Drizzle?

Will Drizzle hit its target and be the sleek and lean database kernel that MySQL once could have been?

Hard to say.

It's hard to guess what the future holds and too easy to play armchair quarterback about the work of others. But these are question worth wondering about a bit.

What's it all mean?

Nowadays MySQL has a much slower release cycle that it used to. It's still available in "commecial" and free ("community") releases. There's still a company behind it--a much larger one in fact. But one that also has a vested interest in showing how it works better on their storage appliances or 256 "core" computers and whatnot.

Clustering is still very niche. Transactions are not.

Meanwhile, all the cutting edge stuff (at least from the point of view of scaling) is happening outside Sun/MySQL and being integrated by OurDelta and even Drizzle. The OutDelta builds are gaining steam quickly and Drizzle is shaping up.

Heck, I'm hoping to get an OurDelta box or two on-line at work sometime soon. And I'd like to put a Drizzle node up too. I want to see how the InnoDB patches help and also play with the InnoDB plug-in (and its page compression).

The next few years are proving to be far more interesting than I might have expected from a project and technology that looked like was on a track straight for Open Source maturity.

And you know what? I like it.

Posted by jzawodn at 07:02 AM

December 05, 2008

My Dumb Cat Video

It's Friday and this is the Internet, so I present to you Cats Eating Chicken, or "My Dumb Cat Video" (embedded below too).

The background is that we had a bit of leftover grilled chicken the other night and decided to bust it up and feed it to the cats. Amusingly, they all got together to partake of the feast, but a couple of them got curious about the camera too.

Both Timmy (white and grey) and Thunder (mostly grey) give the camera a sniff or two. My boys (Barnes and Noble) remained single-mindedly devoted to devouring the meat.

Anyway, we found it rather amusing.

Have a good weekend...

Posted by jzawodn at 07:43 AM

December 03, 2008

gzip and hard links. I don't get it.

I recently was looking to make compressed backups of some files that exist in a tree that's actually a set of hard links (rsnapshot or rsnap style) to a canonical set of files.

In other words, I have a data directory and a data.previous directory. I would like to make a backup of the stuff in data.previous, most of the files being unchanged from data. And I'd like to do this without using lots of disk space.

The funny thing is that gzip is weird about hard links. If you try to gzip a file whose link count is greater than one, it complains.

I was puzzled by this and started to wonder if it actually over-writes the original input file instead of simply unlinking it when it is done reading it and generating the compressed version.

So I did a little experiment.

First I create a file with two links to it.

/tmp/gz$ touch a
/tmp/gz$ ln a b

Then I check to ensure they have the same inode.

/tmp/gz$ ls -li a b
5152839 -rw-r--r-- 2 jzawodn jzawodn 0 2008-12-03 15:38 a
5152839 -rw-r--r-- 2 jzawodn jzawodn 0 2008-12-03 15:38 b

They do. So I compress one of them.

/tmp/gz$ gzip a
gzip: a has 1 other link  -- unchanged

And witness the complaint. The gzip man page says I can force it with the "-f" argument, so I do.

/tmp/gz$ gzip -f a

And, as I'd expect, the new file doesn't replaced the old file. It gets a new inode instead.

/tmp/gz$ ls -li a.gz b
5152840 -rw-r--r-- 1 jzawodn jzawodn 22 2008-12-03 15:38 a.gz
5152839 -rw-r--r-- 1 jzawodn jzawodn  0 2008-12-03 15:38 b

This leads me to believe that the gzip error/warning message is really trying to say something like:

gzip: a has 1 other link and compressing it will save no space

But I still don't see the danger. What can't that simply be an informational message? After all, you still need enough space to store the original and compressed versions since the original (in the normal case) exists until it is done writing the compressed version anyway. (I checked the source code later.)

So what's the rationale here? I don't get it.

Posted by jzawodn at 03:51 PM