June 18, 2009

Distributed Parallel Fault Tolerant File System Wanted

posted at 11:06 AM | link | comments (11) | bloglines

After re-thinking and re-tooling some of the work I've been doing to take advantage of Gearman, I've started to wish for a big file system in the sky. I guess it's no surprise that Google uses GFS with their Map/Reduce jobs and that Hadoop has HDFS as a major piece of its infrastructure.

The Wikipedia page List of file systems has a section on Distributed parallel fault tolerant file systems that appears to be a good list of what's out there. The problem, of course, is that it's little more than a list.

Do you have any experience with one or more of those? Recommendations?

I should say that I'm only interested in something that's Open Source and have a minor bias against big Java things as well as stuff that appear as though it would cease to exist if a single company went out of business.

I'm not too worried about POSIX compliance. The main use would be for writing large files that other machines or processes would then read all or part of. I don't need updates. The ability to append would probably be nice, but that's easy to work around.

More specifically, these three have my eye at the moment:

It's interesting that some solutions deal with blocks (often large) while others deal with files. I'm not sure I have a preference for either at the moment.

But I'm open to hearing about everything, so speak up! :-)

HiddenNetwork.com Banner

June 16, 2009

My Drizzle Article in Linux Magazine (XtraDB and Sphinx too!)

posted at 07:13 AM | link | comments (3) | bloglines

After a few years off, I've been doing some writing for Linux Magazine (which is on-line only) again recently. First off, my just published feature article is Drizzle: Rethinking the MySQL Database Kernel. As you might have guessed, it looks at Drizzle and some of the reasoning behind forking and re-working MySQL.

I'm also writing a weekly column that we've been calling "Bottom of the Stack" (RSS) which started a few weeks ago. Recent articles are:

The basic idea is that I'll be writing about back-end data processing and systems--the sort of stuff that lives in the bottom half of the traditional LAMP stack.

If you have ideas of stuff you'd like to cover, please drop me a line.

As a side note, I wrote my first article for Linux Magazine back in June of 2001: MySQL Performance Tuning. Those were the MySQL 3.23 days. How time flies!

An amazing credit to some of the folks involved with Linux Magazine, all of my past writings are available there.

HiddenNetwork.com Banner

May 29, 2009

Hulu Desktop vs. Hulu in Browser vs. Nexflix (Flash vs. Silverlight?)

posted at 09:46 AM | link | comments (13) | bloglines

For a while now we've had a computer hooked up to our large screen television and stereo system. A couple months back I upgraded the motherboard, CPU, and memory so that we could start using the Windows 7 release candidate and Windows Media Center on it. The new hardware also meant we could play back high definition video.

Aside from playing back photos in Picasa and various video files, we also stream music using Pandora or play from our library using WinAMP or Media Center. For streaming video, we'd been using Hulu a bit (which is Flash based) and Netflix (which is Silverlight).

Yesterday we tried out Hulu Desktop and attempted to watch the Glee pilot. Hulu desktop crashed on the first run after install (could be a Windows 7 issue) but then ran fine upon restarting it. But the video quality was low and quite jerky. It used a lot of CPU too. This made me wonder if it was really taking advange of the video capabilities of our system.

It was bad enough that we switched to watching the show using the browser-based streaming. Hitting the full-screen high quality version acually played better there and used less CPU. So the desktop application clearly needs some performance tuning.

I compare all of this with Netflix streaming which uses Silverlight and the difference is clear, even in 720p resolution we tend to keep our display set to. Microsoft has done a good job of tuning Silverlight for video. If I recally, they have very good H264 support built-in.

That said, I'm glad to see Hulu Desktop out. It makes a lot of sense to have an app that can be controlled via IR remote instead of the wireless keyboard we had been using.

HiddenNetwork.com Banner

May 27, 2009

The Big ALTER TABLE Test

posted at 07:34 AM | link | comments (12) | bloglines

As previously noted, I've been playing with XtraDB a bit at work. Over a week ago I decided to test compression on one of our larger tables and it took a bit longer than I expected.

(root@db_server) [db_name]> ALTER TABLE table_name \
    ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
Query OK, 825994826 rows affected (8 days 14 hours 23 min 47.08 sec)
Records: 825994826  Duplicates: 0  Warnings: 0

Zoiks!

It's too bad we couldn't use all the cores on the machine for the ALTER TABLE, huh?

On the plus side, the file sizes aren't too bad.

Before:

-rw-rw---- 1 mysql mysql 1638056067072 2009-05-24 09:23 table_name.ibd

After:

-rw-rw---- 1 mysql mysql  587617796096 2009-05-27 07:14 table_name.ibd

I'll have more to say about XtraDB and the compression options in a later post. But given the interest that my Twitter messages about this big ALTER TABLE generated yesterday, I figured I'd share a bit more detail here.

For anyone doing the math at home, that's going from rougly 1.5TB to 500GB (the new file size is slightly inflated, since this slave managed to replicate about a week's worth of data before I caught it). I was hoping for 4:1 compression and managed about 3:1.

HiddenNetwork.com Banner

May 18, 2009

Our CEO Tells It Like It Is

posted at 09:28 AM | link | comments (5) | bloglines

I haven't said a lot about the ongoing battle between Craigslist and certain overly-aggressive politicians, but after reading his most recent blog post, An Apology Is In Order, I have to say that I'm really proud of Jim. Having a CEO standing up to politicians and media for what he believes is right and true really reaffirms my decision to join Craigslist last year. Better yet, he's outdone most of the media by, *gasp*, actually linking to relevant information in his post.

Many prominent companies, including AT&T, Microsoft, and Village Voice Media, not to mention major newspapers and other upstanding South Carolina businesses feature more “adult services” ads than does craigslist, some of a very graphic nature. For a small sampling, look (careful NSFW) here, and here, and here, and here, and here, and here, and here, and here, and here, and here, and here.
Have you fully considered the implications of your accusations against craigslist? What’s a crime for craigslist is clearly a crime for any company. Are you really prepared to condemn the executives of each of the mainstream companies linked above, and all the others that feature such ads, as criminals? craigslist may not matter in your world view, despite our popularity among your constituents, but mightn’t you want an endorsement from any of the SC newspapers for your gubenatorial campaign, whose publishers you’ve just labeled as criminals? Do you really intend to launch a criminal investigation against the phone company? What about potential new jobs connected to big data center buildouts in SC by Internet companies? Are you *sure* you want prosecute all of their CEOs as criminals???

Keep it up, Jim.

HiddenNetwork.com Banner

May 15, 2009

Smoked Tilapia with Honey Glaze

posted at 06:53 AM | link | comments (8) | bloglines

Tonight we tried smoking Tailpia for the first time and it came out very well. The full recipe is below the before and after pictures.

The fish in the smoker, before adding honey and spices:

Smoked Tilapia

The fish after smoking:

Smoked Tilapia

Dinner!

Smoked Tilapia

Recipe

This recipe is very simple but surprisingly tasty. It has some sweet, some smoky, and some spice.

Arrange the fish on the smoker grate and coat liberally with honey.

Then sprinkle on a bit of each of the following:

  • Salt
  • Pepper
  • Garlic Powder
  • Cayenne Pepper

Then smoke for 18-20 minutes with hickory chips.

The resulting mix of honey, cayenne, and smoky flavor is truly excellent.

HiddenNetwork.com Banner

May 14, 2009

MySQL 5.1.34 and XtraDB 1.0.3-5

posted at 08:17 AM | link | comments (2) | bloglines

For a couple weeks now, we've had a MySQL server at work running MySQL 5.1.34 and the Percona XtraDB 1.0.3-5 plug-in. I'm testing an upgrade path for our current MySQL 5.0.xx based servers.

Aside from some confusion about the initial setup (getting the built-in InnoDB to stay out of the way), things have gone very well. All of our largest and most active tables have been converted to the new Barracuda file format and I tested compression on the two largest. The first didn't fare so well, but it's a fairly over-indexed table with small rows. The second, however, contains a decent sized TEXT column (classified posting bodies) and it compresses quite nicely. Any change in CPU utilization is not significant.

I hope to soon get a second server running and try to increase the compression ratio, going from KEY_BLOCK_SIZE of 8K to 4K to see if we can squeeze some more out of it without much penalty.

I love all the extra stats provided by the InnoDB plug-in and the Percona (and Google) enhancements. There are a lot of knobs that I've not yet tried to turn, but it's good to know they're available when that day comes.

More to come...

See Also: Is MySQL 5.1 a compelling upgrade?

HiddenNetwork.com Banner

May 02, 2009

I love my Samsung NC10 Netbook

posted at 09:26 AM | link | comments (12) | bloglines

A couple months back I got a Samsung NC10 Netbook. I had been on the fence for a long time, trying to decide among the an Eee PC from Asus, the MSI Wind, and the Samsung NC10. Right about the time I was going to finally do it, the ASUS Eee PC 1000HE was announced. I read a lot of reviews from folks who'd bought those netbooks and eventually settled on the NC10.

The main deciding factors, in order, were: keyboard layout, build quality, ease of upgrade (mine has 2GB RAM and a 320GB disk, twice the standard in both departments), and Linux support.

The MSI Wind was okay in most of those areas, but based on the many reviews I read, the NC10 was a little bit better across the board. So I ruled the MSI Wind out on that alone.

The Eee PC 1000HE had just been announced and would have required waiting a few more weeks. Plus, its keyboard had a few quirks--notably the right shift key being too small and offset. Keyboards are really important to me. It had the advantage of a claimed 9.5 hour battery life vs. the 7-8 claimed on the NC10.

However, I picked the NC10 and couldn't be happier. Running Windows XP, I routinely get 7 hours of battery life with Wifi on and the screen brightness set low (don't need it any higher most of the time). The wireless range is excellent, keyboard feels right, and it's surprisingly snappy.

Building a computer this small and light is really an exercise in design compromises and I thing Samsung nailed it perfectly. I've traveled with it a few times and used it all day at the MySQL Conference without having to worry about being near power outlets.

The 2GB RAM is more than enough for anything I'm likely to throw at the Atom processor and 320GB is enough space for all my music, pictures, and pretty much everything except my extensive video collection.

I use the NC10 a bit day to day. I think of it as a "couch computer" in addition to use in travel and at a conference. But I've also hooked it up to a HDTV to show off pictures to family and that worked just as well. I could easily see doing a day's worth of work on it with an external monitor and mouse.

A couple weeks ago, I grabbed the latest Ubuntu Netbook Remix (UNR) and booted the NC10 off a USB stick to see how it worked. Much to my surprise, it seems that everything worked well without any tweaking. I'll probably stick with XP for now, but it's good to see that Ubuntu would work for me too.

If I was looking to buy now, I'd look really hard at the Samsung NC10, ASUS Eee PC 1000HE, and the Samsung NC20 (the 12" model).

HiddenNetwork.com Banner

April 30, 2009

Is MySQL 5.1 a compelling upgrade?

posted at 08:00 AM | link | comments (13) | bloglines

Of the many things I noticed last week at the MySQL Conference, one of the most notable was how many companies have not upgraded from MySQL 5.0 to 5.1 yet. Craigslist is in that camp and it seems that we're joined by the likes of Facebook, Google, Yahoo, and about half a dozen other companies that use MySQL heavily.

Come to think of it, SmugMug are the only folks I've talked with who've made the jump (video).

So it's not much of a surprise that Percona is asking if they should backport 5.4 fixes to 5.0.

Given our usage of MySQL to date, the only really compelling reason to upgrade is to get access to the InnoDB plug-in (and XtraDB). I'd like to get compression, some of the various performance patches, and tuning options, so plug-in support is a requirement. But beyond that, I just don't see anything new in 5.1 that we need.

As I noted in The Real or Official MySQL? Does Not Matter!, the storage engines matter more than the various add-on features in the server itself.

Have you upgraded or are you thinking about it? If so, why? If not, why not?

HiddenNetwork.com Banner

April 29, 2009

MySQL and Drizzle Tip: Checking configuration file syntax (faking configtest)

posted at 07:45 AM | link | comments (1) | bloglines

In the Apache world, you might be familiar with tweaking your config file(s) and then running

$ apachectl configtest

to see if the config parses. We've been discussing this on the drizzle mailing list and talking in general about configuration handling and management. Well, it turns out that you can fake it in MySQL and Drizzle too.

If you have a new configuration in /tmp/new.cnf, try this:

$ mysqld --defaults-file=/tmp/new.cnf --verbose --help

And it'll run mysqld (or drizzled), parse the config, report any problems, print help, and exit without initializing storage engines or trying to grab a port.

Neat trick!

Thanks to Baron Schwartz, Arjen Lentz, and Sheeri Cabral (book) for helping to demonstrate this.

HiddenNetwork.com Banner

April 28, 2009

Slides from "MySQL and Search at Craigslist"

posted at 08:11 AM | link | comments (8) | bloglines

Last week I delivered a talk titled "MySQL and Search at Craigslist" as part of the 2009 MySQL Conference and Expo. I talked about some of the good and bad of our MySQL work and also talked a lot about our recent Sphinx deployment. The slides are embedded below and here, thanks to SlideShare. (Anyone know why Google Docs doesn't yet handle OpenOffice presentations?)

I gave a copy to O'Reilly but don't yet see them on the conference site.

The usual disclaimers apply: I said a lot that's not well reflected in the slides, and I'm sure they're less informative without the audio or video that may or may not have been captured. Either way, hopefully they're useful to folks who saw the talk and even a few of those who did not.

I also delivered a condensed version of this talk at the Percona Performance Conference and those slides are available too.

Thanks to everyone who provided useful feedback and discussion before and after the talks.

HiddenNetwork.com Banner

April 27, 2009

Hack on Drizzle Full-Time for Rackspace!

posted at 04:08 PM | link | comments (0) | bloglines

Given the current state of the economy, here's a quick job plug for anyone interested and qualified.

At the Drizzle Developer Day on Friday, I got to meet Adrian Otto from Rackspace. Rackspace has a cloud offering (think Aamazon EC2) that's called Mosso and is willing to employ full time developers who spend all their time working on Drizzle.

Here's what he sent to the mailing list.

I was speaking with Eric Day at the developer conference, and I mentioned that Rackspace is wiling to employ full time developers for the specific purpose of furthering the Drizzle project's mission. He suggested that I email you on this list becuase he expected there would be interest in this offer. If you work on the project now part time, and want to make it a full time job working exclusively on the Drizzle project, let me know. The Rackspcae Cloud believes in open source, and we want to do our part to make Drizzle a wild success.

Talking with him a bit, the rationale is simple: Rackspace wants to offer the best cloud resources they can. Part of that means having infrastructure that their customers need and works well. They're betting the Drizzle is part of their future, and hiring a few people to work on it makes that future a reality sooner than later.

It looks like Mark Callaghan (Google) likes the idea too, as does Don MacAskill (SmugMug).

Anyway, ping me if you're interested and I'll put you in touch.

HiddenNetwork.com Banner
Copyright 2008, Jeremy D. Zawodny.
All Rights Reserved.