The InnoDB storage engine has done wonders for MySQL users that needed higher concurrency than MyISAM could provide for demanding web applications. And the automatic crash recovery is a real bonus too.

But InnoDB's performance (in terms of concurrency, not really raw speed) comes at a cost: disk space. The technique for achieving this, multiversion concurrency control, can chew up a lot of space. In fact, that Wikipedia article says:

The obvious drawback to this system is the cost of storing multiple versions of objects in the database. On the other hand reads are never blocked, which can be important for workloads mostly involving reading values from the database.


Imagine a set of database tables will tens of millions of rows and a non-trivial amount of churn (new records coming in and old ones being expired or removed all the time). You might see this in something like a large classifieds site, for example.

Furthermore imagine that you're using master-slave replication and the majority of reads hit the slaves. And some of those slaves are specifically used for longer running queries. It turns out that the combination of versioning, heavy churn, and long running queries can lead to a substantial difference in the size of a given InnoDB data file (.ibd) on disk.

Just how much of a difference are we talking about? Easily a factor of 4-5x or more. And when you're dealing with hundreds of gigabytes, that starts to add up!

It's no secret that InnoDB isn't the best choice for data warehouse looking applications. But the disk bloat, fragmentation, and ongoing degradation in performance may be an argument for having some slaves that keep the same data in MyISAM tables.

I know, I know. I can do the ALTER TABLE trick to make InnoDB shrink the table by copying all the rows to a new one, but that does take time. Using InnoDB is definitely not a use it and forget about it choice--but what database engine is, really?.

Looking at the documentation for the InnoDB plug-in, I expect to see a real reduction in I/O when using the new indexes and compression on a data set like this. (Others sure have.) But I don't yet have a sense of how stable it is.

Anyone out there in blog-land have much experience with it?

Posted by jzawodn at August 12, 2008 01:05 PM

Reader Comments
# Mark Callaghan said:
on August 12, 2008 03:13 PM
# Jeremy Zawodny said:


Oh, wow. I believe the proper response is "waaaaayyy behind."

Thanks for the pointer. I get learn all that InnoDB stuff I used to not care about in the past. What fun!

on August 12, 2008 03:47 PM
# Chip Turner said:

We have had similar issues in some of our databases. It can get particularly bad when you're not using file-per-table and have occasionally large temporary tables; basically the space never comes back. So we wrote a tool that basically dumps N tables in parallel then reloads them in parallel, resulting in compaction of the innodb files. Much faster than running N ALTER statements.

Typically you do it on your replicas, then fail over to them to become new, compact primaries. IIRC we see around a 30% size improvement, as well as similar speed increases in various access patterns.


on August 13, 2008 01:22 AM
# James Day said:

You might also want to consider turning off the adaptive hash index on just the slaves that do the big jobs. I'm assuming that they won't touch the same rows enough for the adaptive hash index write (and mutex lock that happens on row reads) to be worth the cost. There are also still quite a few cases in the to do list where the adaptive hash index lock is held when it should be released.

It is good for most workloads, so it's a case of try it and see which way is best.

Do you have XA distributed transactions turned on or off on your slaves? Off is probably best unless they are relaying replication to another slave.

The innodb_max_purge_lag variable was added after a place doing load testing for an appliance filled the appliance hard drive.

on August 13, 2008 04:05 AM
# James Day said:

I should clarify that the adaptive hash index write is only while creating the adaptive hash entry on the first read that needs it.

on August 13, 2008 04:10 AM
# Adi R said:

Warning: I am one of those guys who always likes to Redesign things.

With that disclaimer, have you looked into Hypertable as an option? It is open source implementation of Google like Big Table. I understand it also keeps versions (I actually like that), but can scale much better and fault tolerance is way higher.

on August 14, 2008 05:33 PM
# Jeremy Zawodny said:

I know of it but have not looked at it.

on August 14, 2008 06:27 PM
# Dave Edwards said:

Hi Jeremy,

This is covered in Log Buffer #110 -


on August 15, 2008 10:50 AM
# Don MacAskill said:

Like everyone else, we have this problem too. Our current ugly solution is to periodically drop a slave out of production, ALTER it , then slowly rotate that slave's DB(s) to the other slaves and eventually the master (either during an outage, or promoting a master to slave, or whatever). It sucks, and it can be error-prone, but at least we periodically recover disk space.

Something else I'm actively investigating is fs-level compression, too. In this case, ZFS. It seems more sane / compatible / performant / etc than doing InnoDB's plugin, but we haven't yet tried it in production. I'll certainly blog about it when we do. It won't "solve" the data size bloat, but it should seriously and massive reduce it, I would imagine, since I'll bet there's a lot of duplicated data which is easily compressed.

on August 18, 2008 09:20 AM
# Richard Lynch said:

I vaguely recall Mark Smith of MySQL once suggesting that you just make all the slaves use MyISAM for the tables that are innoDB on the masters...

But if it's that simple, and YOU don't know it, then I probably am mis-understanding the problem and/or mis-remembering the solution...

on August 20, 2008 12:45 PM
# Jeremy Zawodny said:

As you might expect, it's not quite that simple... But I'm considering something like that.

on August 20, 2008 12:47 PM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.


Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.