December 30, 2008
Twitter as a Dynamic DNS Service
I occasionally wish to know the IP address of my home Cable Modem or DSL connection but don't really care if it's available in DNS or not. It occurred to me that if I could programmatically detect the IP change, I'd be able to notify myself via Twitter.
At first, I wanted a simple web service that'd tell me my IP address--something like WhatIsMyIP.com but an API suitable for simple scripting.
Not finding anything, I created this massive PHP script instead and hosted it on my server:
That made it easy to write a simple bash shell script that can be run from cron every few minutes. It uses curl to hit that script and compares the result with the previous result (stored in ~/.last_ip). If they differ it updates the file and tells twitter, again using curl.
Of course, I had to create that new twitter account and then follow it in my main account. But, hey, that wasn't so hard. Now I have a Web 2.0ish social dynamic DNS thingy that uses Twitter.
Aren't I cool and buzzword compliant?!
December 19, 2008
Talk Announcement: MySQL and Search at Craigslist
I recently learned that my talk has been accepted for the 2009 MySQL Conference in Santa Clara, California. It is currently scheduled for Tuesday the 21st and titled MySQL and Search at Craigslist.
Here's the abstract (which I've promised to expand upon soon):
Millions of people search for things every day on craigslist: tickets, cars, garage sales, jobs, events, and so on.
This talk will look at the recent evolution of database and search architecture at Craigslist, including performance, caching, partitioning, and other tweaks. We'll pay special attention to the unique challenges of doing this for a large data set that has an especially high churn rate (new posts, edits, and deletes).
And we strive to do this using as little hardware and power as possible.
If you're coming to the conference, drop by and harass me. :-)
If you're not sure check out the full schedule--there's a lot of good stuff packed into the conference already and a lot of talks are still not even posted.
December 08, 2008
The New MySQL Landscape
Interesting things are afoot in the MySQL world. You see, it used to be that the MySQL world consisted of about 20-40 employees of MySQL AB (this funny distributed Swedish company that built and supported the open source MySQL database server), a tiny handful of MySQL mailing lists, and large databases were counted in gigabytes not terabytes. A Pentium III was still a decent server. Replication was a new feature!
Hey, anyone remember the Gemini storage engine? :-)
How times have changed...
Nowadays MySQL is sort of a universe onto itself. There are multiple storage engines (though MyISAM and InnoDB are still the popular ones), version 5.1 is out (finally), and the whole company made it over 400 employees before it was gobbled up by Sun Microsystems (a smart move, IMHO, though history will judge that) a while back.
If I had to guess 5 years or so ago what would be interesting to me today about MySQL, I'd have been really, really wrong. The future rarely turns out like we think. Just ask Hillary Clinton.
Here's a little of what's rattling around in the MySQL part of my little brain these days...
Outside Support, Patches, and Forks
The single most interesting and surprising thing to me is both the number and necessity of third-party patches for enhancing various aspects of MySQL and InnoDB. Companies like Percona, Google, Proven Scaling, Prime Base Technologies, and Open Query are all doing so in one way or another.
On the one hand, it's excellent validation of the Open Source model. Thanks to reasonable licensing, companies other than Sun/MySQL are able to enhance and fix the software and give their changes back to the world.
Some organizations are providing just patches. Others, like Percona are providing their own binaries--effectively forks of MySQL/InnoDB. Taking things a step further, the OurDelta project aims to aggregate these third party patches and provide source and binaries for various platforms. In essences, you can get a "better" MySQL than the one Sun/MySQL gives you today. For free.
Meanwhile, development on InnoDB continues. Oh, did I mention the part where they were bought by Oracle (yes, *that* Oracle) a while back? Crazy shit, I tell you. But it makes sense if you squint right.
Anyway, the vibe I'm getting is that folks are frustrated because there's not a lot of communication coming out of the InnoDB development team these days. I can't personally verify that. It's been years since I corresponded with Heikki Tuuri (the creator of InnoDB). So folks like Mark Callaghan of Google have been busy analyzing and patching it to scale better for their needs.
And we all benefit.
Drizzle
Taking things a step further yet, the Drizzle project is a re-making of MySQL started primarily by Brian Aker, who worked as MySQL's Director of Architecture for years. Brian is now at Sun and, along with a handful of others at Sun and elsewhere, is ripping out a lot of the stuff in a fork of MySQL that doesn't get used much, needlessly complicated the code, or is simply no longer needed.
In essence, they're taking a hard look at MySQL and asking what it really needs to provide for a lot of it's uses today: Web and "cloud" stuff. He visited us at Craigslist a few months ago to talk about the project a bit and get our input and feedback. I believe it was that day I joined one of the mailing list and started following what's going on. Heck, I even build Drizzle on an Atom-powered MSI Wind PC regularly.
It's great to see a re-think of MySQL going on... keeping the good, getting rid of the bad, and modularizing the stuff that people often want to do differently (authentication, for example).
It's even better to see the group that's hacking on it. They really have their heads on straight.
Unanswered Questions
Why is all this even necessary? Are the "enterprise" customers and their demands taking focus away from what used to be the core use and users of MySQL? Is Sun hard to work with?
It's clear that both the MySQL and InnoDB teams could be doing more to help. But having worked at a large company for long enough, I realize that things are rarely as simple as they should be.
Will this stuff get integrated back into mainline MySQL? Will Linux distributions like Ubuntu, Debian, and Red Hat pick up OurDelta builds? What about Drizzle?
Will Drizzle hit its target and be the sleek and lean database kernel that MySQL once could have been?
Hard to say.
It's hard to guess what the future holds and too easy to play armchair quarterback about the work of others. But these are question worth wondering about a bit.
What's it all mean?
Nowadays MySQL has a much slower release cycle that it used to. It's still available in "commecial" and free ("community") releases. There's still a company behind it--a much larger one in fact. But one that also has a vested interest in showing how it works better on their storage appliances or 256 "core" computers and whatnot.
Clustering is still very niche. Transactions are not.
Meanwhile, all the cutting edge stuff (at least from the point of view of scaling) is happening outside Sun/MySQL and being integrated by OurDelta and even Drizzle. The OutDelta builds are gaining steam quickly and Drizzle is shaping up.
Heck, I'm hoping to get an OurDelta box or two on-line at work sometime soon. And I'd like to put a Drizzle node up too. I want to see how the InnoDB patches help and also play with the InnoDB plug-in (and its page compression).
The next few years are proving to be far more interesting than I might have expected from a project and technology that looked like was on a track straight for Open Source maturity.
And you know what? I like it.
December 05, 2008
My Dumb Cat Video
It's Friday and this is the Internet, so I present to you Cats Eating Chicken, or "My Dumb Cat Video" (embedded below too).
The background is that we had a bit of leftover grilled chicken the other night and decided to bust it up and feed it to the cats. Amusingly, they all got together to partake of the feast, but a couple of them got curious about the camera too.
Both Timmy (white and grey) and Thunder (mostly grey) give the camera a sniff or two. My boys (Barnes and Noble) remained single-mindedly devoted to devouring the meat.
Anyway, we found it rather amusing.
Have a good weekend...
December 03, 2008
gzip and hard links. I don't get it.
I recently was looking to make compressed backups of some files that exist in a tree that's actually a set of hard links (rsnapshot or rsnap style) to a canonical set of files.
In other words, I have a data directory and a data.previous directory. I would like to make a backup of the stuff in data.previous, most of the files being unchanged from data. And I'd like to do this without using lots of disk space.
The funny thing is that gzip is weird about hard links. If you try to gzip a file whose link count is greater than one, it complains.
I was puzzled by this and started to wonder if it actually over-writes the original input file instead of simply unlinking it when it is done reading it and generating the compressed version.
So I did a little experiment.
First I create a file with two links to it.
/tmp/gz$ touch a /tmp/gz$ ln a b
Then I check to ensure they have the same inode.
/tmp/gz$ ls -li a b 5152839 -rw-r--r-- 2 jzawodn jzawodn 0 2008-12-03 15:38 a 5152839 -rw-r--r-- 2 jzawodn jzawodn 0 2008-12-03 15:38 b
They do. So I compress one of them.
/tmp/gz$ gzip a gzip: a has 1 other link -- unchanged
And witness the complaint. The gzip man page says I can force it with the "-f" argument, so I do.
/tmp/gz$ gzip -f a
And, as I'd expect, the new file doesn't replaced the old file. It gets a new inode instead.
/tmp/gz$ ls -li a.gz b 5152840 -rw-r--r-- 1 jzawodn jzawodn 22 2008-12-03 15:38 a.gz 5152839 -rw-r--r-- 1 jzawodn jzawodn 0 2008-12-03 15:38 b
This leads me to believe that the gzip error/warning message is really trying to say something like:
gzip: a has 1 other link and compressing it will save no space
But I still don't see the danger. What can't that simply be an informational message? After all, you still need enough space to store the original and compressed versions since the original (in the normal case) exists until it is done writing the compressed version anyway. (I checked the source code later.)
So what's the rationale here? I don't get it.
November 23, 2008
Opa! is Good Greek Food in Willow Glen
A month or so ago, the long under-construction Opa! opened its doors on Lincoln Ave in downtown Willow Glen. Wanting to try it for a while, we walked down on Friday night for dinner. And we were not disappointed.
The Good
The menu is straightforward and has a good variety of Greek food. We ordered the Keftedes (Greek Meatballs) as an appetizer. The dish consisted of two well prepared meatballs and an excellent sauce.
For the main courses, we selected a Beef Souvlaki Pita (hers) and Seafood Souvlaki (mine). Both came with the most excellent Opa! Fries. (Think: garlic fries with a twist.) The food came in a reasonable amount of time and our waitress was very friendly and helpful. It was very tasty and portions were not excessively large either.
Their drink menu contains a selection of beers and a good selection of Greek wines as well. The wine we sampled was quite good and is apparently available at Costco. Needless to say, we're going to have to verify that for ourselves. ;-)
The interior is well decorated. I especially like the large TV monitor that shows what songs are playing over the sound system.
Pricing was reasonable. Dinner for two with drinks, an appetizer, and desert (Baklava!) was about $50. Not the sort of thing we do often, but definitely not out of line with other favorite eating establishments.
The Bad
Opa! is a small sit down restaurant with tables for 2 and 4 (mostly) that also handles to go orders. It's often very full and could definitely benefit from more space inside. As a result, the tables are fairly close together and the waitresses occasionally bump into customers. But space isn't easy to come by in Willow Glen's downtown.
More
Opa! has over 60 ratings and reviews on Yelp and is also discussed a bit on Willow Glen 2.0.
If you're looking for good Greek food in the area, I'd highly recommend giving Opa! a try.
November 21, 2008
Bash Trick: Watching Multiple Background Jobs
I recently had a need to add some error checking to a bash script that runs multiple copies of a Perl script in parallel to better utilize a multi-core server. I wanted a way to run these four processes in the background and gather up their exit values. Then, if any of them failed, I'd prematurely exit the bash script and report the error.
After a bit of reading bash docs, I came across some built-ins that I hadn't previously used or even seen. First, I'll show you the code:
wait.sh
This is the bash script that runs the parallel processes and gathers up the exit values.
sleeper
And here's the Perl script that I wrote in order to test the functioning of wait.sh. It accepts to arguments. The first is the number of seconds to sleep (to simulate the delay associated with doing work) and the second is the exit value it should use (any non-zero value indicates a failure).
Discussion
New to me was the use of let to do math on a variable so that I can count up the number of failures. Is there a better way? There's no native ++ operator in bash. Similarly, using jobs to get a list of pids to wait on provided to be a very useful idiom.
The code is straightforward and works for my purposes. But since 99% of my time is spent in Perl rather than bash, I wonder what I could have done differently and/or better. Feedback welcome.
And, if this is at all useful to you, feel free to take it and run...
Finally, I'm starting to really dig gist.github for showing off bits of code. It's good stuff.
November 17, 2008
TV Watching and Happiness
In one of those "well, duh!" moments the other day, I came across a headline on Slashdot that said Unhappy People Watch More TV. Given that I mostly stopped watching TV quite some time ago and consider it to be one of the more rude devices in our culture, I clicked thru to read about how others have discovered what I'd already guessed was true...
A new study by sociologists at the University of Maryland concludes that unhappy people watch more TV, while people who describe themselves as 'very happy' spend more time reading and socializing. 'TV doesn't really seem to satisfy people over the long haul the way that social involvement or reading a newspaper does,' says researcher John P. Robinson. 'It's more passive and may provide escape--especially when the news is as depressing as the economy itself.
Imagine that... Stagnation and exposure to negative information leads to sadness. It goes on...
The data suggest to us that the TV habit may offer short-run pleasure at the expense of long-term malaise.' Unhappy people also liked their TV more: 'What viewers seem to be saying is that while TV in general is a waste of time and not particularly enjoyable, "the shows I saw tonight were pretty good.
Another shock. TV provides only a short-term reward (kind of like a drug hit).
If this resonates with you a bit, or you suspect deep down that there's more going on with the influence of TV in our culture, I highly recommend reading Amusing Ourselves To Death by Neil Postman if you have not already.
It's too bad this stuff doesn't get taught in school--where, I'm told, teachers are using PowerPoint more and more.
*sigh*
November 14, 2008
Asynchronous MySQL Client in Perl
I recently found myself wishing for an async library for MySQL. My goal is to be able to fire off queries to a group of federated servers in parallel and aggregate the results in my code.
With the standard client (DBD::mysql), I'd have to query the servers one at a time. If there are 10 servers and each query takes 0.5 seconds, my code would stall for 5 seconds. But by using an async library, I could fire off all the queries and fetch the results as they become available. The overall wait time should not be much more than 0.5 seconds.
While I found little evidence of anyone doing this in practice, my search led me to the perl-mysql-async project on Google Code. It's a pure-Perl implementation of the MySQL 4.1 protocol and an asyncronous client that uses Event::Lib (and libevent) under the hood.
The code contains little in the way of documentation or examples, aside from the simple bundled test script. After a bit of mucking around with it, I managed to cobble together a working example. It looks like this:
Sure enough, that code runs in just a bit more time than the longest query it executes, rather than the sum of all the query times.
What still surprises me is that this code doesn't appear to get a lot of use (or at least discussion) in the real world. In the PHP world, the mysqlnd driver offers async queries.
So count this as my contribution to demonstrating that Perl can do async MySQL queries too.
November 13, 2008
Post-Election Thoughts: Equal but Not
I'm happy that Barack Obama won the election. I think it's time to stir things up a bit.
What really bothers me is that fact that we still don't have equal voting in this country. We certainly have the technology to share vote counts quickly and efficiently, so who not just do that? Why screw around with an electoral college anymore?
It seems disingenuous at best and an outright lie at worst to call Obama's victory a "landslide" when the actual percentages of the popular vote (the only vote that should count) were so close. Yet the large difference in electoral vote counts is supposed to make us believe that something very different happened. And the media was more than happy to play along with that deception (what a surprise, huh?).
It should not be possible to lose by having more votes than your opponent, but it is. Why does nobody seem to care? (See: electoral college, specifically this.)
Of all the countries that have tried to copy our model of democracy in the last 200 years or so, can you name a single one that adopted the electoral college as a piece of their political infrastructure?
I'd love to have my vote count as much as everyone in all the other states.
Why is that so hard?
October 22, 2008
Kick Ass Fonts in Ubuntu: 3 Easy Steps
A few days ago I made yet another tweak to my Ubuntu laptop to make the fonts look a little better. The result is that I'm now quite happy--impressed even. Here are the three things I've done to make my day-to-day work easy on the eye.
First, enable subpixel smoothing in the System > Appearance control panel.
For a long time that's all I had done was was reasonably happy. Things looked okay but not great. But I used GNU Emacs for most of my coding and wanted fonts there that looked at good as those in gnome terminal.
That led me to the second tip: install emacs-snapshot and use the GTK version. Then you can add this to your ~/.Xresources file:
Emacs.font: Monospace-10
And bingo! The same font that's in your terminal is in Emacs.
That made me happy in Emacs, but my Firefox fonts were still a bit sucky. So when I read Tweak Your Font Rendering for Better Appearance in Tombuntu, I had to give it a try.
I created a ~/.fonts.conf file and added this to it:
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
<match target="font">
<edit name="autohint" mode="assign">
<bool>true</bool>
</edit>
</match>
</fontconfig>
I logged out and back in and suddnely found myself staring at fonts in Firefox that looked as good as I've seen in Safari on a Mac.
That's all there was to it for me: subpixel rendering, emacs-snapshot, and enabling hinting via a .fonts.conf file.
It's worth noting that you can go even farther with the advanced font settings, but I really haven't needed to go that far yet.
October 21, 2008
Random Updates
I've got several random things to say to the interwebs but none of them merit a blog post individually...
First off, I love data. But I hate the fact that the spreadsheet in OpenOffice 2.x and Gnumeric both have row limits of 65,536. I don't know who missed the boat on 32 and 64 bit CPUs, but it's rather annoying! And, yes, twitter people, I know that 65,536 is a 16 bit limit--not 8. I was trying to make a point.
Secondly, Yahoo can haz layoffs (again). Having lived through 3 rounds of layoffs in my 8.5 years at Yahoo, I know what that feels like. :-( If you're a kick-ass Perl hacker or an excellent systems and network administrator who'd like to work at a great company in San Francisco, let me know.
Thirdly, the dumbest bugs are often the ones that have been in your code a long time and are incredibly easy to keep glossing over as you read and re-read it.
Fourthly, Tie::Syslog is pretty handy but seems to not like being used multiple times in the same app. Each instance seems to think that it has the same "identity." Anyone seen that before? I haven't dug into that yet but probably will soon.
Finally, we're out of town for a few days while the house is being fumigated for termites. And we brought all four cats with us. That what I call an adventure.
Now back to your regularly scheduled... uh, stuff.











