As previously noted, I've been playing with XtraDB a bit at work. Over a week ago I decided to test compression on one of our larger tables and it took a bit longer than I expected.
(root@db_server) [db_name]> ALTER TABLE table_name \ ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4; Query OK, 825994826 rows affected (8 days 14 hours 23 min 47.08 sec) Records: 825994826 Duplicates: 0 Warnings: 0
Zoiks!
It's too bad we couldn't use all the cores on the machine for the ALTER TABLE, huh?
On the plus side, the file sizes aren't too bad.
Before:
-rw-rw---- 1 mysql mysql 1638056067072 2009-05-24 09:23 table_name.ibd
After:
-rw-rw---- 1 mysql mysql 587617796096 2009-05-27 07:14 table_name.ibd
I'll have more to say about XtraDB and the compression options in a later post. But given the interest that my Twitter messages about this big ALTER TABLE generated yesterday, I figured I'd share a bit more detail here.
For anyone doing the math at home, that's going from rougly 1.5TB to 500GB (the new file size is slightly inflated, since this slave managed to replicate about a week's worth of data before I caught it). I was hoping for 4:1 compression and managed about 3:1.
Posted by jzawodn at May 27, 2009 07:34 AM
Hey Jeremy, this looks pretty interesting... what's the performance like pre/post compression? Since a large DB like this is typically IO bound it'd be interesting to know whether the reduction in IO due to compression more than offsets the increased CPU utilization to do decompression (I suspect it does).
Mike:
Once the slave catches up, we'll have it serving live traffic so we can see. :-)
Next time I hear 'mapreduce isn't as good as databases' I'll refer them to this post. Munging loads of data into a new form using lots of cores is exactly what mapreduce is good for.
ls -lh is your friend. No need to do math on crazy large numbers. :)
Will you posting benchmarks on CPU usage and quantity of bytes read/written?
The SmugMug folks run MySQL on top of Solaris. With ZFS, there's the ability to enable on-the-fly compression on a file system, and they found enabling LZJB compression to have quite favourable performance enhancements in many cases:
http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/
The traditional gzip compression algorithm didn't come out that well in the comparison.
David:
Yeah, I'll have more to say later.
Right now I'll just say that adding a second OS to the mix (despite how gung-ho one of our sysadmins might be about Open Solaris!) certainly has operational considerations.
I'm just doing some testing now to see how this will work out if we decide to go in this direction.
I wasn't implying you add (Open)Solaris to the mix (as much as I'm a fan of Solaris 10+), but thought you might find the data point interesting.
Given the growing number of cores becoming available, it's becoming more attractive to have a few them work towards reducing I/O bandwidth needs (and latency). It will probably take some experimenting to figure out the best place to use it (file system, DB core) under different circumstances.
David:
Oh, no offense taken.
I've met with Don to talk about this stuff as well as reading his blog posts on the topic. In fact, it was his use of XtraDB that helped give me the confidence to make the jump.
Hi!
This should be a pretty low hanging piece of fruit for someone to fix. I did multi-thread support for Archive in 6.0 (which I suspect is now lost in the tree flip flops (except for in Drizzle)). Even fire and forget threads could do this without a lot of effort.
Cheers,
-Brian
Given that this file is 1.5TB, isn't this table a candidate for sharding and distribute the table across multiple machines?