I spent a fair amount of time of Friday trying figure out why our FreeBSD servers running MySQL 4.0.2 were doing so much better than our Linux servers running MySQL 4.0.2. They're all slaves of the same 3.23.51 master and get roughly equal query loads, thanks to our Alteon load-balancers (yes, the ones that occasionally stop working right).

What I noticed while watching each of them with mytop is that the Linux boxes seem to have far more slow queries than the FreeBSD boxes. Now the FreeBSD boxes in question are newer. They're Compaq DL-380s with dual 1.2 GHz CPUs, 2GB of RAM, and 6 36GB SCSI disks. The Linux boxes are a bit older and slower. But the difference was still surprising. Over the last 24 hours, the FreeBSD boxes had each logged 3 slow queries, while the Linux boxes had logged a few thousand of them. Clearly something was up.

So I got on the boxes and noticed something odd. The load average on the Linux machines was higher than I'd expect. Rather than being in the 0.5 - 2.0 range, it was hitting between 7 and 9 during busy times. Odd. I ran top for a while to see if I noticed anything odd. Sure enough, after a few minutes, I found the pattern. The kswapd process was using up a fair amount of CPU time--sometimes as much as 99% of one CPU.

It gets more interesting. Both Linux boxes have swap disabled. It's been that way ever since I got sick of dealing with the 2.4 kernel's brain-dead virtual memory system last year. Why would kswapd even be running on a system with no swap? I have no idea.

But I decided to do some research and see if anyone had seen this before. The closest I got was this message on the linux-kernel mailing list, a complain by MySQL AB's own Sascha Pachev.

He noted similarly odd behavior and asked that Rik look into it. Unfortunately, I haven't been able to find any follow-up messages.

So I went back to looking at the configuration on the machines in question. Both have 2GB of RAM, roughly half of which is for MySQL. I have the key_buffer set to 512M as well as the innodb_buffer_pool. That leaves 1GB for the OS cache, buffers, and related stuff. It should be more than enough, shouldn't it?

Just for the heck of it, I backed both values down to 384M and restarted MySQL. After an hour or so, things began to look bleak again. Lots of slow queries and the kswapd process (actually a kernel thread) was getting more CPU time than I'd like. It was at this point that I really began to marvel at the situation. The FreeBSD VM subsystem never does stupid things like this. In fact, our MySQL/FreeBSD boxes rarely swap unless I do something really stupid. How can the one in Linux be this much worse? Beats me.

Anyway, even more frustrated, I decided to re-enable swap reboot the machine. At this point, I had little to lose. Once it came back up and I got MySQL started, things looked okay. kswapd wasn't as busy, and there were fewer slow queries. In fact, after 1 day and 9 hours, the server has only logged 66 slow queries. But according to top there's about 47MB of swap in use. The resident size of mysqld is 736MB, while it's overall size is 816MB. Apparently the kernel swapped out part of the buffer pool for InnoDB or the MyISAM key buffer.

I guess that extra gig of memory isn't enough for it.

I fail to understand what it's doing. But the machine seems to perform better with swap enabled. The only theory I've developed so far goes like this: With swap disabled, the kernel (being very stupid), goes looking for pages that it can swap out. It finds them but cannot swap them to disk. Next time around, it repeats this process, never realizing how futile it is. With swap finally enabled, it can swap out some memory and get the breathing room that it thinks it needs.

If anyone has hints on how this can be tuned (like telling the kernel not to bother), I'd LOVE to hear about it.

Linux may have FreeBSD beat when it comes to threading, but it sure could learn a lot from FreeBSD when it comes to virtual memory management.

Update: Thanks to the folks at NewsForge, you can find a teaser for this blog entry here (and it's currently on the home page). They picked this one up quite fast. I'm impressed.

Update #2: Allow me to respond from some feedback that I've seen so far. First off, we've been running 2.4.18 for quite a while now. We started with 2.4.9, tried 2.4.12 and 2.4.16. There's only so much time I can spend switching kernel versions and re-testing. Now that 2.4.19 is out, we'll give it a shot.

A few folks have suggested that since FreeBSD is the best tool for the job, I should just shut up and use it. If only that was the case. I'll post another entry in a few days detailing the problems with running a high-volume MySQL server on FreeBSD. It has issues of it's own, mostly related to FreeBSD's poor threads implementation.

Thanks for all the feedback so far. Some of it looks promising. The flames, however, are simply ignored.

Posted by jzawodn at August 04, 2002 02:42 AM

Reader Comments
# Joe said:

Admittedly I know nothing about kernel hacking but why can't the FreeBSD VM subsystem be ported to Linux? Why reinvent the wheel?

on August 4, 2002 07:17 AM
# PolarWolf said:

It's fairly well known that the VM in the 2.4 kernel series is rather broken. You fail to mention which kernel version you use though, it's kinda important. Besides, why aren't you asking this on LKML instead of complaining about it in this weblog? At any rate, I see no real compelling reason not to assign a few megabytes for swap. Even your not so current hardware should have enough diskspace for that, I'd say. In the past verious kernel developers have spouted out that 2.4 versions of the linux kernel *need* some swap space, and in older versions it was recommended to 2x ram size, iirc. Besides, if you're so hung up on BSD's VM, why not use BSD? Best tool for the job, etc, and apparently BSD is the best tool for your job.

on August 4, 2002 08:58 AM
# Heru said:

As the guy before me said, the VM in 2.4.x is finicky. I hope to all that is holy that the 2.6 series fixes that.

Anyhow, I've noticed that Linux will be odd without swap space, whether it is the 2.2 or 2.4 kernel series. As for why, who knows exactly. My guess is it is a bit of legacy code left over from the days when Linux wouldn't work at all without swap. I know for a fact that there is a way to use a swap file instead of a swap partition, it may be better for some systems, but I don't know how to do this. For now I just make a nice swap partition and turn it on, it may never get used, but Linux will run better with it.

And again, as the guy before me said, use what's best for the job. If BSd works better, then use it in this case.

on August 4, 2002 09:49 AM
# Siyan L. said:

I don't know if there will be a 2.6, or 3.0 :-).
I very much agree that the whole system will work much better even if you just have a farily little amount of Swap on, from my personal experience.

on August 4, 2002 11:47 AM
# James Morrison said:

Something that may help you is to modify the values
in /proc/sys/vm/freepages to all be 0, so the
kernel never thinks it needs to page.

on August 4, 2002 01:15 PM
# Daryl Stimm said:

First Off, why would you disable swap? that was the dumbest Idea I have ever heard. Second what kernel do you run? is it higher than 2.4.10? 2.4.10 is where linus decided to use a new and improved VM, I have noticed huge increases in speed just by upgrading my kernel. I also run Gentoo Linux, which has by far the best kernel I have ever ran (2.4.18-ac2 with some sweet patches). You should try changing the kernel. Also Mysql normally by default likes to talk to swap, its normal, so it should always be turned on. The VM problem in the 2.4 series kernel has been fixed in kernels greater than 2.4.10. Some kernels In between are borked but thats why you should always go with the latest and greatest. HECK! 2.4.19 just came out two days ago! it took over 8 months for it to come out, so I am sure its full of bug fixes! Also try using Alan Cox kernels, they are very good and very fast. Im not going to say good luck because you should have tried this before writing this lame article. Gezz...

on August 4, 2002 01:58 PM
# Freggy said:

Upgrade your kernel to 2.4.19 with the latest patches from Andrea Arcangeli. It includes pachtes to the VM wich are supposed to fix some problems on systems with a lot of RAM.

on August 4, 2002 02:20 PM
# aeoo said:

If you can't get any useful info, and google'ing around gives you no useful info, I recommend you drop into IRC on OPN. Go to #kernelnewbies and patiently ask your question there. With any luck you might get an answer from someone who actually knows what they're talking about.

What's up with requiring email addres, btw? That's just lame. I tried to put in obfuscated email address, but it didn't like that, so I have to put a fake one instead.

on August 4, 2002 03:38 PM
# Ask Bjoern Hansen said:

mytop is really cool. Thanks for the pointer (and for writing the tool!)

Here's a bug. I don't know what happened and I haven't looked at the code ....

Use of uninitialized value in subtraction (-) at /home/perl/bin/mytop line 606.
Use of uninitialized value in division (/) at /home/perl/bin/mytop line 618.
Use of uninitialized value in division (/) at /home/perl/bin/mytop line 626.
Use of uninitialized value in division (/) at /home/perl/bin/mytop line 635.
Use of uninitialized value in division (/) at /home/perl/bin/mytop line 635.
Illegal division by zero at /home/perl/bin/mytop line 635.

on August 4, 2002 08:37 PM
# Jason Dixon said:

The following settings in /proc/sys/vm/kswapd might be of some use to you. These settings are from a Red Hat 7.3 box running the updated 2.4.18-5 kernel.

Tries base:
The maximum number of pages kswapd tries to free in one round. This value is divided by 4 or 8.
(default 512)

Tries min:
The minimum number of times kswapd tries to free a page.
(default 32)

Swap cluster:
This is the number of pages kswapd writes in one turn. This value should be large so that kswapd does its IO in large chunks, but not so large that it floods the request queue.
(default 8)

Hope this helps!

on August 4, 2002 10:49 PM
# borracho said:

Interesting comment about the RH7.3 settings:
I run a server (Dual AMD XP / Tyan board with scsi disks) and RH 7.3 (2.4.18-5smp #1 SMP i686)
with the same VM settings as listed in Jason's post.

I also run the recommended binaries from MySQL (v3.23.51). I have a similar problem where the system doesn't appear to swap. I have 2Gb of RAM and swap enabled (as confirmed by swapon -s), but it doesn't appear to be used. Not even a few Mb.

I'm experiencing a sharp increase in threads (a jump of 150) at times (from it's usual 300) in a period of 5 seconds. This doesn't cause MySQL to die, but my server becomes unresponsive but server load avg increase from 0.3 - 400 within 2/3 minutes. Could it be swap/VM related?

The reason I stayed away from BSD for a MySQL server was the performance/threading issues, but I'm having to trim back my server tunings under Linux and keep a close eye to keep the server under control...

on August 14, 2002 12:18 AM
# Fred Flintstone said:

Just use windows. Seriously.

on December 16, 2002 06:55 PM
# Eric Bergen said:

The rmap patch http://surriel.com/patches/ does wonders for making kswapd behave and allow mysqld to take up all the memory it needs to.

on March 3, 2003 03:36 PM
# foobar said:

you've tree options:
1) tune your 2.4 vanilla kernel.
2) test rmap(http://www.surriel.com/patches)
3) use 2.5, which will give you a *huge* performance boost

on March 10, 2003 04:33 AM
# Gerald said:

What's about using WOLK (wolk.sourceforge.net)? And please give us an update concerning 2.5+ kernels.

on April 15, 2003 06:04 PM
# On-Vacation said:

So in summary:

1. upgrade to a newer kernel
2. Configure a little swap
3. Run FreeBSD
4. Do all three. :)

on September 12, 2003 07:23 PM
# Robert Moonen said:

I'm running Frenzy(FreeBSD 6.2) installed to CF card on a HP-T5720 thin client and it suffers from the same sort of VM stupidity you refer to linux suffering from when running without swap. I am in the process of installing a swap partition on it at the moment to stop it locking up when it runs out of space.

on July 3, 2009 11:05 PM
# Robert Moonen said:

Well, I eat my hat. After installing a swap partition on a spare sd card in a reader, the problem still occurred and in fact even got worse(the machine locked solid), but that was due to something else I am sure.
The locking (for long periods) seems more to be a problem with Opera 9.64 than anything else. :-/
Watching the TDF on sbs2 atm.

on July 4, 2009 10:11 AM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.


Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.