Jeremy Zawodny's blog: June 15, 2003 - June 21, 2003 Archives

June 21, 2003

Rewarding Me

I don't know why, but I've been so in the groove today that I managed to get more writing done in 4 hours than I have in the past 1.5 weeks.

So I'm taking a 2 hour break to reward myself. I'm cooking up some Teriyaki Salmon to go with my Heineken and blueberry muffin.

Oh, and I'm going to consume these while watching my unrated, wide screen Old School DVD. :-)

Yeay!

Now if I could only figure out what caused this overdue burst of productivity...

Posted by jzawodn at 08:27 PM

Studies Show...

Why are people compelled to use phrases like "studies show..." as evidence for things that are already common sense to most reasonably intelligent people?

I've been hearing this commercial a lot on the radio recently. It advocates getting children started on education as early as possible in life. Or something like that. I stopped paying attention when I realized how insultingly stupid it is.

Studies how that children who do well in high school are more likely to graduate from college.

No shit?

Well, damn! These radio commercial people sure are smart. I'd have never guessed that on my own.

</sarcasm>

I can only assume that their target audience are people without common sense, idiots, or both.

What I'd give for a radio with an intelligence knob on it.

Oh, well. It can't be any worse than television.

Posted by jzawodn at 05:47 PM

High Availability is NOT Cheap

Over the years, I've seen too many posts on the MySQL mailing list from eager users of free software on cheap hardware who want 24x7x365 availability for their databases. Inevitably, the question gets replies from a few folks who respond with something like:

Here's what I did... I setup replication and wrote some Perl scripts to notice when there's a problem. They'll switch everything to the slave. The code is ugly but it works for me. :-)

I always cringe when reading those responses. I shouldn't really complain, though. I've been guilty of providing terse replies once in a while. But usually I just ignore 'em because I don't have the time or experience to really do the question justice.

Today I read a response from Michael Conlen that finally comes close to explaining why you're probably asking for something you don't need, why it's not cheap, and what you really need to be thinking about.

Since it was posted to a public list, I don't mind quoting it here (with a few spelling fixes).

First get an acceptable outage rate. Your only going to get so many nines, and your budget depends on how many. The system will fail at some point, no matter what, even if it's only for a few seconds. That's reality. Figure out what kinds of failures you can tolerate based on how many 9's you get and what kinds you have to design around. From there you can figure out a budget. 99.999% uptime is 5 minutes and 15 seconds per year of total downtime. 99.99% is 52.56 minutes and so on. At some point something will happen, and I've never seen anyone offer more than 5 9's, and IBM charges a lot for that. Then, figure out everything that could cause an outage, figure out how to work around them and give them a budget. Watch how many 9's come off that requirement.

If you have to use MySQL I'd ditch PC hardware and go with some nice Sun kit if you haven't already, or maybe a IBM mainframe. Sun's Ex8xx line should let you do just about anything without taking it down (like change the memory while it's running). Then I'd get a bunch of them. Then I'd recode the application to handle the multiple writes to multiple servers and keep everything atomic, then test the hell out of it. There's a lot of issues to consider in there, and you probably want someone with a graduate degree in computer science to look over the design for you. (anything this critical and I get someone smarter than me to double check my designs and implementations). It may be best to just build it in to the driver so the apps are consistent.

On the other hand, if you have all this money, look at some of the commercial solutions. This is probably heresy on this list, but hey, it's about the best solution for the needs right? Sybase or DB2 would be my first choices depending on the hardware platform (Sun or Mainframe). The systems are setup to handle failover of the master server. I know for Sun you want to be looking at Sun Clustering technology, a nice SAN and a couple of nice servers. You write to one server, but when it fails the backup server starts accepting the write operations as if it were the master. There's a general rule with software engineering that says "if you can buy 80% of what you want, your better off doing that than trying to engineer 100%"

Think about the networking. two data paths everywhere there's one. Two switches, two NIC cards for each interface, each going to a different switch.

Depending on where your "clients" are you need to look at your datacenter. Is your database server feeding data to clients outside your building? If so you probably want a few servers in a few different datacenters. At least something like one on the east coast and one on the west coast in the US, or the equivalent in your country, both of whom have different uplinks to the Internet. Get portable IP addresses and do your own BGP. That way if a WAN link fails the IP addresses will show up on the other WAN link even though it's from a different provider.

This is just a quick run down of immediate issues in a 24x7x365, it's not exhaustive. Think about every cable, every cord, every component, from a processor to a memory chip and think about what happens when you pull it out or unplug it, then make it redundant.

Well said.

Like the title of this entry says, High Availability is NOT Cheap.

Now, I know what you're thinking. These folks who are asking for 24x7x365 don't really need what they're asking for. A response like this is not helpful.

Re-read the first three sentences of the reply again.

Posted by jzawodn at 09:12 AM

The autoconf insanity

I've long been annoyed by the needless complexity and obscurity of autoconf and related tools. Finally someone who understands the problem much better than I do has spoken up.

The autoconf tools are also portable to almost every *nix platform in existence, which generally makes it easier to release your program for a large variety of systems. However, despite these few pluses, the auto* tools are constantly a thorn in the side of users and developers alike.

Read the full article on freshmeat.net

Posted by jzawodn at 08:40 AM

June 20, 2003

1 Year of Blog

It seems that I've been doing this a year now.

(Well, that's technically not true. Way before I knew was blogging was, I was already doing it but I didn't know it at the time. I got the idea from Alan Cox. My old on-line journal covered my doings from part of 1999 through 2002. Of course, only a few friends were reading it. It had no syndication. I didn't know what RSS was. I didn't know I should know. And the software was just 2 simple Perl scripts, a Makefile, and a single table in MySQL. Like Radio, it build pages statically and pushed them to the server via FTP or rsync.)

Getting Started

I started this blog at the suggestion of Jon Udell, originally playing with UserLand's Radio. I gave up on Radio after a few days of remembering why I didn't want to depend on Windows for something so important. In fact that first attempt at blogging is still hosted on UserLand's server.

A month or so after I began, we started syndicating my Linux entries and MySQL entries on the Linux Magazine web site.

My first post was about hardware and on-line communities. Since then I've moved on to a MovableType powered blog hosted on one of my co-located servers. And now I write entries in a variety of categories: MySQL, Linux, Perl, Random funny stuff, etc.

Late last year, I also started my flying blog to document my glider training and subsequent fun in the air. Later this year I'll begin training for my power training and continue to log my progress there.

Traffic

Let's look at some numbers.

In the past 365 days, I've posted roughly 732 entries. That averages about 2 per day.

My top 10 posts (based on the number of times viewed in the past year) are listed below. You can hover over the link to see the number of hits each has had as of today.

A lot of those, like DVD Player Hacks, RedHat ISOs, and Penis Puppets are mostly thanks to Google. Others, such as the MySQL related posts and the 10 Habits are thanks to lots of bloggers linking to them. The MySQL Full-Text story was linked from the MySQL home page for a while. That generated a lot of hits.

It's interesting to note that no one force drove the bulk of traffic. Some was from other bloggers, some from Google, and some for other publicity. Notable entries are: Slashdot, Scripting News, Daemon News, MySQL AB, Barrapunto, Jon Udell, Phil Windley, and so on.

In the last year:

The front page of my blog has been hit 207,377 times, resulting in 15,619,459,294 bytes transferred (not counting images or style sheets). That's over 14GB.
My RSS feeds have been requested 1,148,625 times, totalling 7,316,111,247 bytes sent--nearly 7GB.
I've been visited by 245,111 unique IP addresses.
I've served a total of 2,771,246 requests across all archives, indexes, and RSS feeds--not including images or style sheets. That's 40,447,281,983 bytes--nearly 38GB
When all versions are combined, the most popular aggregators are NetNewsWire and Radio UserLand.

Since I log all traffic into MySQL, that was pretty easy to figure out. :-)

People

I had no idea it would last this long. And I wasn't prepared for all the benefits I'd reap from blogging. I've met a lot of interesting people via this blog--to many to mention. If you consider the time and energy that I put into reading other blogs, this has been quite an undertaking--but a completely worthwhile one.

My blogging in some way or another influenced a few others to jump on the bandwagon. In no particular order:

Brandt Fundak (college friend, former roommate)
Dan Isaacs (college friend and self-proclaimed asshole)
Steve Friedl (unix whiz)
Kasia Trapszo (the original Unix Girl)
Morgan Deters (college friend, perpetual student)
Michael Radwin (Yahoo)
Derek Balling (former Yahoo, perpetual cynic
Josh Woodward (from college, co-worker at Marathon and Yahoo, etc.)
Katie Stanton (former Yahoo co-worker, now at Google)

If there are others, please let me know. If you don't belong on this list, tell me. My memory can be fuzzy at times.

Trivia: Dan and Brandt's blogs are hosted on this server too.

Okay, I'm done rambling. For now.

Posted by jzawodn at 10:52 PM

Good Tidbits Today

Philip Greenspun on Chinese cars and the future

Ars Technica: Google doesn't like Googling

Derek want bacon!

ArsDigita: An Alternate Perspective, found via trained monkey

Ice Cream made via liquid nitrogen via several bloggers

Posted by jzawodn at 11:50 AM

June 18, 2003

Udell on Blind Ignorance

Okay, he doesn't call it that, but I do.

After all these years, the Unix and Windows cultures are still profoundly unaware of one another's strengths.

Read the whole story in Jon's Towards a unification of strengths article.

Posted by jzawodn at 11:02 PM

TypePad in Private Beta?

My surprisingly popular MSNBOT post got a referral and a TrackBack from http://firstpost.typepad.com/firstpost/2003/06/microsoft_crush.html but tried to follow the link and hit a password prompt. I guess the beta advertised on the TypePad site is running.

It's no surprise that they're using TrackBack, of course.

I've got to give 'em credit for calling the server firstpost. :-)

Posted by jzawodn at 10:28 PM

Search Users

Tim Bray's second installment is on-line now. It's brief, but I assume it's building up to the third installment.

Posted by jzawodn at 08:27 AM

Long eitheruse of time does not become pain.

Huh?

Go read this Register story and you'll understand.

It's really quite amusing if you're amused by Engrish and similar things.

Posted by jzawodn at 08:21 AM

June 17, 2003

MSNBOT - The Bot From Redmond

Well, it seems that they're crawling now.

They don't appear to have hit my site(s) yet. But they're probably just starting to ramp up.

According to Feedster there aren't many bloggers talking about it yet. Anyone seen this thing hit their site yet? How long has it been coming? Does it visit your blog frequently (like Googlebot does)?

Posted by jzawodn at 10:14 PM

Overcoming MySQL's 4GB Limit

(After having explained this for the 35th time, I decided it's time to simply put something on-line.)

When a MyISAM table grows large enough, you'll encounter the dreaded "The table is full" error. Now I could simply point at that page and leave this subject alone, but there's more to this story.

When this happens, the first reaction I hear is "You never told me that MySQL has a 4GB limit! What am I going to do?" Amusingly, I usually do describe the limit when I discuss the possibility of using MySQL with various groups--they often forget or underestimate the impact it will have. Putting that aside, the problem is easily fixed, as that page explains. You simply need to run an ALTER TABLE command.

And you'll need to wait. That ALTER TABLE is going to take some time. Really.

To protect yourself in the future, use the MAX_ROWS and AVG_ROW_LENGTH options at CREATE TABLE time if the table is likely to get big.

InnoDB tables to not have this limitation because their storage model is completely different.

Where does this limit come from?

In a MyISAM table with dynamic (variable length) rows, the index file for the table (tablename.MYI) stores row locations using 32-bit pointers into the data file (tablename.MYD). That means it can address only 4GB of space.

This problem is both a historical artifact and an optimization. Back when MySQL was created, it wasn't common to store that much data in a single table. Heck, for a long time 4GB was an entire hard disk and most operating systems had trouble with files larger than 2GB. Obviously those days are gone. Modern operating systems have no trouble with large files and hard disks larger than 100GB are quite common.

From an optimization point of view, however, the 32-bit pointer still makes sense. Why? Because most people are running MySQL on 32-bit hardware (Intel/Linux). That will change as use of AMD's Opteron becomes more widespread, but 32-bit will be the majority for the next few years. Using 32-bit pointers is the most efficient way to do this on 32-bit hardware. And even today, most MySQL installations don't have tables anywhere near 4GB in size. Sure, there are a lot of larger deployments emerging. They're all relatively new.

An Example

Here's a table that you might use to store weather data:

mysql> describe weather;
+-----------+--------------+------+-----+------------+-------+
| Field     | Type         | Null | Key | Default    | Extra |
+-----------+--------------+------+-----+------------+-------+
| city      | varchar(100) |      | MUL |            |       |
| high_temp | tinyint(4)   |      |     | 0          |       |
| low_temp  | tinyint(4)   |      |     | 0          |       |
| the_date  | date         |      |     | 0000-00-00 |       |
+-----------+--------------+------+-----+------------+-------+
4 rows in set (0.01 sec)

To find its size limit, we'll use SHOW TABLE STATUS

mysql> show table status like 'weather' \G
*************************** 1. row ***************************
           Name: weather
           Type: MyISAM
     Row_format: Dynamic
           Rows: 0
 Avg_row_length: 0
    Data_length: 0
Max_data_length: 4294967295
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2003-03-03 00:43:43
    Update_time: 2003-03-03 00:43:43
     Check_time: 2003-06-14 15:11:21
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

There it is. Notice that Max_data_length is 4GB. Let's fix that.

mysql> alter table weather max_rows = 200000000000 avg_row_length = 50;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show table status like 'weather' \G
*************************** 1. row ***************************
           Name: weather
           Type: MyISAM
     Row_format: Dynamic
           Rows: 0
 Avg_row_length: 0
    Data_length: 0
Max_data_length: 1099511627775
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2003-06-17 13:12:49
    Update_time: 2003-06-17 13:12:49
     Check_time: NULL
 Create_options: max_rows=4294967295 avg_row_length=50
        Comment: 
1 row in set (0.00 sec)

Excellent. Now MySQL will let us store a lot more data in that table.

Too Many Rows?

Now, the astute reader will notice the Create_options specify a limit of 4.2 billion rows. That's right, there's still a limit, but now it's a limit on number of rows, not the size of the table. Even if you have a table with rows that are 10 times as large, you're still limited to roughly 4.2 billion rows.

Why?

Again, this is 32-bit hardware. If you move to a 64-bit system, the limit is raised accordingly.

Posted by jzawodn at 01:42 PM

June 16, 2003

Asking Questions in Public

I read this a long time ago and agreed with it completely but had lost the URL. The document Why Ask Questions in Public? does a good job of explaining why one should shouldn't reply off-list by default and directly engage helpful list members privately.

Allow me to quote a bit here...

I, like many other people with technical expertise in some topic, regularly read various Usenet newsgroups and mailing lists and try to answer questions for which I know the answer. However, I am also extremely busy with a large number of projects, often including improving the very software that people are asking questions about, and I've found that taking some time to help other people sometimes has the regrettable tendency to add to my total level of pending work. This is sometimes hard for me to deal with, and I'd like to ask for your help in solving this problem.

The problem generally takes the following form: Someone posts a question about something I know something about. I respond in the newsgroup or on the mailing list with some suggestions or possibly some questions. In response, the person mails me directly (rather than responding in the newsgroup or mailing list) with the answers and with other questions. Occasionally the problem takes another, related form: I post about a topic, someone who has a question about that topic reads my posts and thinks I sound knowledgeable, and they send me e-mail asking me questions.

I understand why people do this. Often they view newsgroups and mailing lists as giant, impersonal places and want to get out of them as soon as they can, and as soon as they find someone who can answer their questions, they latch on to that person and want to interact with them directly. I'm sure that they don't understand that this behavior causes problems for the person of whom they're asking questions.

Go read the rest to see how it concludes. Tell others who seem to habitually get this wrong.

It always surprises me that people who've been on-line for years seem to somehow not get the importance of keeping some discussions in the open. I don't know why. Even after repeated prodding, they seem to lapse into their old habits.

What's even worse for me is that we seem to have some of the worst offenders at work. :-(

Am I alone here or do others see this happening too often? Do you try to do anything about it? Have you found an effective tactic? I haven't.

Posted by jzawodn at 03:30 PM

Tim Bray on Search

Tim Bray has posted the first of what he says will be a searies of articles on search technology. It looks like he's off to a good start.

Posted by jzawodn at 08:59 AM