June 17, 2003

MSNBOT - The Bot From Redmond

Well, it seems that they're crawling now.

They don't appear to have hit my site(s) yet. But they're probably just starting to ramp up.

According to Feedster there aren't many bloggers talking about it yet. Anyone seen this thing hit their site yet? How long has it been coming? Does it visit your blog frequently (like Googlebot does)?

Posted by jzawodn at 10:14 PM

Overcoming MySQL's 4GB Limit

(After having explained this for the 35th time, I decided it's time to simply put something on-line.)

When a MyISAM table grows large enough, you'll encounter the dreaded "The table is full" error. Now I could simply point at that page and leave this subject alone, but there's more to this story.

When this happens, the first reaction I hear is "You never told me that MySQL has a 4GB limit! What am I going to do?" Amusingly, I usually do describe the limit when I discuss the possibility of using MySQL with various groups--they often forget or underestimate the impact it will have. Putting that aside, the problem is easily fixed, as that page explains. You simply need to run an ALTER TABLE command.

And you'll need to wait. That ALTER TABLE is going to take some time. Really.

To protect yourself in the future, use the MAX_ROWS and AVG_ROW_LENGTH options at CREATE TABLE time if the table is likely to get big.

InnoDB tables to not have this limitation because their storage model is completely different.

Where does this limit come from?

In a MyISAM table with dynamic (variable length) rows, the index file for the table (tablename.MYI) stores row locations using 32-bit pointers into the data file (tablename.MYD). That means it can address only 4GB of space.

This problem is both a historical artifact and an optimization. Back when MySQL was created, it wasn't common to store that much data in a single table. Heck, for a long time 4GB was an entire hard disk and most operating systems had trouble with files larger than 2GB. Obviously those days are gone. Modern operating systems have no trouble with large files and hard disks larger than 100GB are quite common.

From an optimization point of view, however, the 32-bit pointer still makes sense. Why? Because most people are running MySQL on 32-bit hardware (Intel/Linux). That will change as use of AMD's Opteron becomes more widespread, but 32-bit will be the majority for the next few years. Using 32-bit pointers is the most efficient way to do this on 32-bit hardware. And even today, most MySQL installations don't have tables anywhere near 4GB in size. Sure, there are a lot of larger deployments emerging. They're all relatively new.

An Example

Here's a table that you might use to store weather data:

mysql> describe weather;
+-----------+--------------+------+-----+------------+-------+
| Field     | Type         | Null | Key | Default    | Extra |
+-----------+--------------+------+-----+------------+-------+
| city      | varchar(100) |      | MUL |            |       |
| high_temp | tinyint(4)   |      |     | 0          |       |
| low_temp  | tinyint(4)   |      |     | 0          |       |
| the_date  | date         |      |     | 0000-00-00 |       |
+-----------+--------------+------+-----+------------+-------+
4 rows in set (0.01 sec)

To find its size limit, we'll use SHOW TABLE STATUS

mysql> show table status like 'weather' \G
*************************** 1. row ***************************
           Name: weather
           Type: MyISAM
     Row_format: Dynamic
           Rows: 0
 Avg_row_length: 0
    Data_length: 0
Max_data_length: 4294967295
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2003-03-03 00:43:43
    Update_time: 2003-03-03 00:43:43
     Check_time: 2003-06-14 15:11:21
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

There it is. Notice that Max_data_length is 4GB. Let's fix that.

mysql> alter table weather max_rows = 200000000000 avg_row_length = 50;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show table status like 'weather' \G
*************************** 1. row ***************************
           Name: weather
           Type: MyISAM
     Row_format: Dynamic
           Rows: 0
 Avg_row_length: 0
    Data_length: 0
Max_data_length: 1099511627775
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2003-06-17 13:12:49
    Update_time: 2003-06-17 13:12:49
     Check_time: NULL
 Create_options: max_rows=4294967295 avg_row_length=50
        Comment: 
1 row in set (0.00 sec)

Excellent. Now MySQL will let us store a lot more data in that table.

Too Many Rows?

Now, the astute reader will notice the Create_options specify a limit of 4.2 billion rows. That's right, there's still a limit, but now it's a limit on number of rows, not the size of the table. Even if you have a table with rows that are 10 times as large, you're still limited to roughly 4.2 billion rows.

Why?

Again, this is 32-bit hardware. If you move to a 64-bit system, the limit is raised accordingly.

Posted by jzawodn at 01:42 PM