Yesterday Patrick Galbraith asked What is the official branch of MySQL? which got a lot of attention, including on Slashdot (and the token PostgeSQL comments quickly appeared).

Here's the funny thing. It doesn't matter anymore. Patrick's question is interesting in an academic sense, but it's mainly a distraction from what really matters. (Hint: What's the official Linux and who really cares? Ubuntu? RedHat? Debian? CentOS?)

Storage Engines

Nowadays what matters is the set of available storage engines. InnoDB, Percona's XtraDB, PrimeBase's PBXT, Maria, Falcon, and several others are available or will be soon. I predict that for the foreseeable future, any MySQL distribution or derivative must support the storage engine plug-in API that MySQL 5.1 defined. And since that's the case, it largely won't matter which flavor you using.

Protocol(s)

Look at what's happened in the world of key/value databases in the last few years. More than a few of them speak the memcached protocol as either their native and default or an optional add-on. I suspect the same thing will be the case here. All MySQL distributions and derivatives will speak the "traditional" MySQL protocol (just like memecached has the old protocol). Some of them, notably Drizzle, will have other (newere, better) protocols available as well (much like memcached has the new binary protocol).

Summary

In summary, the choice of MySQL version or derivative won't matter as much as you might think because they'll have the same Storge Engine plug-ins available (thanks to the shared plugin-in API), they'll all speak a common protocol (this may not be true for replication--watch that area closely), and will largely offer the same subset of SQL and SQL extensions.

They'll all be supported by different groups/companies (including some "database appliance" vendors), will all be tuned differently and aimed at slightly different uses cases, and will certainly benefit from a lot of cross-pollination.

That doesn't sound so bad to me.

The fact that nobody can point to the "real" MySQL in a few years just won't matter. Does anyone ask (anymore) which is the "real" Linux? Nope. And for very similar reasons. Think of MySQL as "kernel" and Storage Engine as "filesystem" and you'll realize we've been down this road before.

We're looking at the upgrade from 5.0 to 5.1 soon at Craigslist and don't know if we'll be using InnoDB or XtraDB yet. Time will tell.

See Also: The New MySQL Landscape, which I wrote a few months back--before a good chunk of the MySQL team had left Sun.

Posted by jzawodn at March 31, 2009 07:23 AM

Reader Comments
# Brian Aker said:

Hi!

Monty and company are looking at adopting the new Drizzle protocol so I suspect we will see some consolidation around protocol as well.

The only thing about engines that I think will change, is that that the "wrapper" we use to connect to them will be different for each database. It will be a minor amount of work to make this work across all databases. Why will there be changes? For instance MySQL is trying to do Foreign Keys in their engine layer, and I know that neither Drizzle or MariaDB will support them in the manner they are being designed. In Drizzle we are allowing engines to handle their own data dictionaries, this has the advantage of making online DDL a snap to do (probably transactional as well). No more FRM :)

These changes all add up, but for any good developer it won't be a big deal. Drizzle today already works with PBXT and all of the Innodb forks.

Drizzle may in the end support the memcached protocol. I've been tossing this idea around in my head for a while now. It would be simple for us to do since we will have an abstracted protocol layer. I would like to see us support CouchDB's semantics as well.

Cheers,
-Brian

on March 31, 2009 08:17 AM
# Roland Bouman said:

@Jeremy: Do you think the MySQL 5.1. plugin API is really a common denominator? I am having trouble locating the documentation that defines the protocal. There is some stuff floating around but I direly miss a commitment saying "this is our interface and we will stick to it"

@Brian

"the "wrapper" we use to connect to them will be different for each database."

huh...wtf?....For each database? ... perhaps you meant to write "storage engine"?

on March 31, 2009 11:51 AM
# said:

Hi Jeremy argree it is the engines that matter. FYI we try to keep this page up to date with all the enginges: http://www.bigdatabaselist.com/wiki/Category:MySQL_Storage_Engines

on March 31, 2009 11:56 AM
# Ronald Bradford said:

You raise two points for consideration where I do feel it does matter.

First is the protocol. When a product supports the MySQL protocol it leverages the benefit of the MySQL Connectors and the 'M' of LAMP. Drizzle is an example of a product that does not support the existing MySQL protocol. While other forks continue to support this, entirely different products could also support the protocol, but throw everything away in order to be supported and compatible in a MySQL environment.

The second is the API. Your comment 'must support the storage engine plug-in API that MySQL 5.1' precludes a number of products of operating efficiently within the MySQL limitations of the API. InfoBright and Nitro are two such products that started down the API route but were limited in functionality, especially with the Optimizer and QEP. I was involved with Nitro, and I've been to enough Storage Engine Summits/meetings.

As I wrote in "Understanding the various MySQL Products & Variants" -- http://ronaldbradford.com/blog/understanding-the-various-mysql-products-variants-2009-03-13/ it's hard enough just to understand the official versions from Sun/MySQL.

on March 31, 2009 11:59 AM
# Brian Aker said:

@Roland Bouman So each storage engine writes a wrapper (sometimes I have referred to this as a "shim") to connect to the engine API. The engine API between Drizzle, MySQL, MariaDB, etc are all slightly different. So for each database you need to write a new one. Its not a lot of work, but it does require some expertise in each database to do it.

on March 31, 2009 12:36 PM
# Martin said:

I'm not sure this will happen. Having a real DBMS means more than just a couple of storage engines.

A feature I'm really missing is _one_ tool that allows me to make a consistent backup of the database without having downtime (--lock-all-tables is not an option as in reality it's the same as having downtime) - the key point would be another restriction in MySQL that makes sure the mysqldump will always create a consistent backup of all tables in a database, extra points for a backup tool that let's me dump all databases in a consistent state, which is currently not really possible without downtime or slaves (have only innodb tables in a db, make the backup, alter something in the myisam tables of the mysql database...)

on April 1, 2009 06:20 AM
# Justin Swanhart said:

@Brian regarding the "shim" - This is much more complicated in practice than it is in theory for anything but the most trivial of storage engines. Notable new engines (Kickfire, Infobright, Nitro) use some parts of the storage engine interface, but take work into their own hands when necessary.

The MySQL optimizer is not extensible and thus the engines are forced to use their own optimizer and/or execution engine. These engines are being built to solve many of the problems which MySQL is currently bad at handling but the storage engine interface is too limiting to provide a quality product to the end market.

I agree with Jeremy, that it is the protocol that is important, very much more so than the database abstraction that is running behind the scenes. I also agree that replication between these different databases is going to be interesting. Part of my job at Kickfire revolves around being concerned about exactly these kind of things. We want to be as MySQL compatible as possible and are working hard to create a product that provides as much compatibility as possible, while providing a level of performance that was previously unimaginable.

on April 1, 2009 08:09 PM
# Robert Hodges said:

Hi Jeremy,

Thanks for a very thoughtful article. One idea I would like to put forward is that the anchor for MySQL variants is not really the MySQL API per se but specific storage implementations, for example InnoDB. What if we depart from Brian's "shims" and the InnoDB API diverges significantly? At that point bets on standardization are off.

Robert

p.s., I referenced your article over here in my take on the code split. (http://scale-out-blog.blogspot.com/2009/04/contemplating-mysql-diaspora.html)

on April 6, 2009 12:32 PM
# Robert Hodges said:

Correction! You did point out the storage engine center of gravity. (Very busy week, have to slow down.) However, the question about evolving APIs for storage is still an interesting one.

on April 6, 2009 12:42 PM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

 

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.