Introduction to MySQL Cluster (by Jeremy Zawodny)

Notes from the Introduction to MySQL Cluster talk at the 2004 MySQL User's Conference...

First we're seeing an overview of the NDB architecture. If you've never seen it before, think "Oracle RAC without shared storage" and you're 95% of the way there.

The core NDB engine is a new storage engine inside MySQL. It provides transactions, replication, on-line backups, crash recovery, hash and tree indexes, on-line index builds, auto-detection of a failed node and re-sync when it comes back up. There are rolling upgrades, which provide a way to upgrade things without a disruption of service.

Man, this would be a lot easier if I could draw ASCII art even half as fast as I can type. Oh, well.

The NDB code is in the MySQL 4.1 tree as of today.

History. During 1991-1996, the initial design, prototypes, and research were on-going. Most of the code today originates from 1996. First demo in 1997, and it just got better from there. 1.0 came out in April 2001 w/Perl DBI support. In 1.4 (2002), node recovery was completed. 2003 brought ODBC, on-line backup, unique indexes, and more. The most recent stuff are on-ling upgrades, ordered indexes, and MySQL integration.

Benchmarks on a 72 CPU SunFire box hit 380,000 write txns/sec and 1,500,000 read txns/sec. Nice! More numbers in the slides, but I can't type that fast.

The management server handles the configuration of the cluster (config file, commands (start backup), and so on). There's a C API to the management server. It'll be integrated into MySQL at some point. MySQL uses the NDB API to talk to the cluster for normal data operations. Even with MySQL as the front end to NDB, it's still possible to use the NDB API natively to talk to the cluster too.

The config file specifies transaction timeouts, buffer sizes, heartbeat timeouts, nodes involved, and so on. Messaging diagram I can't replicate. Same with diagram of hash-based table partitioning. On-line backup and restore diagram too.

Data access methods (low-level): primary key, full table scan, unique key, parallel range scan. MySQL hides this from us, but it's interesting anyway. If you're using the NDB API, you need to know this too.

There is always a primary key. Like BDB and InnoDB, it'll create one for you if you don't specify one. Unique indexes are hash indexes, implemented as a second table (funky). It looks like unique indexes are expensive in NDB sort of like the way that secondary indexes are in InnoDB (again, I can't replicate the diagram here).

Currently working on: remaining MySQL functionality in the engine, on-line create/drop index, cluster management from the MySQL API, MySQL replication for MySQL Cluster, on-line add/drop column. Row length limit is current ~8KB but that will change in the future.

Questions and answers now. Clearly some folks don't get it. Heh.

Posted by jzawodn at April 14, 2004 12:48 PM | edit

Reader Comments

# justin said:

thanks for all the updates! these kickass and really show where mysql is headed. it's nice to see some people talking about real world setups and the growing pains..

keep up the good work.. and thanks again.

on April 14, 2004 12:53 PM

# Paul Counts said:

I managed to get down the other benchmark stats, here is what I got (a little shorthandish):

DL 380 G3
280 calls/sec
one call 20 single record transactions with 4 kByte record, 1 insert, 7 reads, 11 updates, 1 delete (limited by network interface)

on April 14, 2004 01:29 PM

# Alex Seropian said:

Looks pretty cool. My questions:

1) The MySQL site talks a lot about it being an "in memory" database. Does that mean that we can't have a database that's larger than available memory?

2) How is database storage bloat relative to MyISAM and InnoDB? InnoDB seems to bloat large databases to a degree that I really can't accept. MyISAM is better, particularly if you can get the fastest queries to not refer to the data file.

3) Can this be used in a single-node configuration? If not, why not?

4) I assume that, as "just another table handler", I can mix table types in one database? i.e. have critical tables as NDB, and some larger ones that aren't as critical as MyISAM? Then queries that needed a MyISAM table would have to go to the "main" server, but could do joins between MyISAM and NDB tables? Then if the MyISAM server went down, I could still use another NDB node for the critical tables.

5) How is the database stored on disk? Logfiles and a periodic hot backup?

on April 14, 2004 09:21 PM

# Michael Kruckenberg said:

Not an expert, but was in the session and can answer Alex on number 1 and 4.

1) Yes, it's in memory. The recommendation is to have memory 2x the size of your database size + 10%. For a 40G database Mikael recommends 100G of memory. I would hope that the future might bring an option to have some percentage of the data on disk.

4) Yes. Can intermix, just like now with MyISAM and InnoDB.

on April 15, 2004 02:12 PM

# Mark Smith said:

Doesn't the memory requirement really a principle restriction?

If I am understanding this correctly, MySQL Clustering is being promoted to be a solution for high availability using off the shelf hardware. If my storage needs hits 80 Gbytes then I would need somewhere around 170 Gbytes of memory. That is a pretty hard pill to swallow if true, at least cheaply.

I am concerned because I am trying to evaluate whether to do replication or clustering as a method to duplex our database to protect from failure and maintenance requirements.

Thanks,

Mark

on April 19, 2004 12:03 PM

# Jeremy Swinborne said:

Also if the DB is completely in memory and all your servers are at the same location. What happens if the power goes out? How do you recover? If it isn't on a harddrive somewhere you have just lost a whole lot of info.

on December 29, 2004 02:08 PM

# Alex Davies said:

I have written a tutorial which will take two clean servers and a third random server (which needs only minor modifications) and turn them into a MySQL Cluster. This is suitable for those who are behind a loadbalancer or using round robin DNS.

The HOWTO can be found at http://www.davz.net/static/howto/mysqlcluster/.

Please note that it IS NOT POSSIBLE TO SET UP A MYSQL CLUSTER WITH JUST TWO MACHINES. You NEED the third machine (although it is only required when starting or stopping the cluster).

This HOWTO was produced with the help of several cluster users and addresses the most common problems that people have when setting up mysql cluster so I strongly suggest that you follow it before posting here.

If you find it useful please link to it. If anyone has any comments please email me (email address on the site).

Alex Davies

--
Alex Davies // http://www.davz.net

on December 31, 2004 03:53 AM

# aparna said:

Hello Jeremy,

I would like you to educate me on the differences between and the issues of MySql Cluster compared to how Oracle cluster .

Can you help me.

Thanks

on January 28, 2007 11:50 PM

# Anup Niroula said:

hello jeremy,
i want to know about the "mysql on a grid" i.e grid computing with mysql. please explain me in details about it.I would be very grateful to you.

on June 8, 2007 08:54 PM

# Gollaplli said:

Hi Jeremy,

What is MySQL clustering?

NDB storage engine is must for implementing MySQL cluster?

can we implement MySQL clustering if we use innodb and myisam storage engine?

Please mail me your answers.

Thanks for the help

on March 2, 2010 11:59 AM

Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.