MySQL: Better Practices (by Jeremy Zawodny)

George Reese has an article up on O'Reilly's OnLAMP titled "Ten MySQL Best Practices." I have a few problems with it, however. So I'll take time to detail them here in the hopes that others might think about these issues too.

First off, I'm amazed that anyone still uses the phrase "best practices" anymore. Think about that phrase. The word "best" implies that the author is smarter than everyone else on this particular topic. How arrogant.

Network Security

Anyway, on to the content. His #2 best practice (see how dumb it sounds? Why can't they be called recommendations? Or lessons? Or...) is "Hide MySQL from the Internet"

MySQL has a pretty solid track record for security of a network service. Nevertheless, there simply is no good reason to expose MySQL directly to the Internet-- so don't do it. When you hide MySQL behind a firewall and enable communication to the server for only hosts running application servers and Web servers, you constrain the path of attack a would-be hacker might take.

But he completely fails to mention that 95% of users could benefit from taking advantate of MySQL's skip-networking option. With it enabled, MySQL doesn't listen to any TCP ports AT ALL. It only listens for local unix socket connections. Given that most MySQL users are running PHP and Apache and MySQL on the same machine, there's little need to leave it open on the network at all.

I'm not saying that this is the solution to the problem in all cases, but he's clearly not aiming at the more sophisticated users anyway. Otherwise he wouldn't have had to tell them to use passwords in his #1 best practice.

Note, also, that he misused the term "hacker."

Binary Data

The #4 "best practice" he lists is "Don't store binary data in MySQL." Instead, he advocates using the filesystem to store binary data. I'm so sick of hearing that argument. A lot of people do store binary data in MySQL and it works just fine thank you very much. In fact, I've found a very useful technique when I need to store large amounts of text (non-binary data). I compress it using Perl's Compress::Zlib and store it in a MySQL blob field. Why? It saves space (and therefore disk seeks on queries). Having the data in MySQL means that I don't have to worry about it being replicated on all the servers it needs to be. I let MySQL do it's job and I get on with mine.

He's likely assuming that the application that needs the data will be running on the same server as MySQL. Either that, or he's saying that you should fragment your data store, putting some of it on one machine (in MySQL) and some on another (in the filesystem). I think that's a decision one should not make lightly.

ANSI SQL Only?

Ah, #5 is one that really gets me. It is called "Stick to ANSI SQL" and says this:

MySQL provides many convenient additions to the ANSI standard that are very tempting for programmers. These additions include timesaving tools like multitable deletes and multirow inserts. When you rely on these features in a MySQL application, you limit the ability to adapt the application to any other database engine. In fact, you may make it impossible to port the application to another database without a significant rewrite. For maximal portability, you should therefore stick to ANSI SQL for your applications.

Oh, please.

One of the big reasons to use MySQL is that it's fast. Damned fast. By taking advantage of it's features, you can save a heck of a lot of time in developing your application. The reality is that people don't often move products or projects from one database server to another. Why not? Because they're all different anyway. Yes, they all support ANSI SQL to some degree or another, but if you limit yourself to the least common denominator you're going to waste a lot of development time.

In fact, when Rasmus Lerdorf (the father of PHP) was at Yahoo! a few weeks ago to talk about PHP, he mentioned what a dumb idea most database abstraction libraries are. Why? Because the lowest common denominator across SQL databases is pretty damned low. And I tend to agree with him.

Not using any of MySQL's special features is like programming in Perl and not using Regular Expressions (just because every language does them differently). It doesn't make a lot of sense.

Sequence Numbers

It gets worse. In #6 "Create your own sequence generation scheme" he argues against using AUTO_INCREMENT fields for a few reasons. Let's examine his reasons.

He starts with "you can only have one AUTO_INCREMENT column per table." So what? That's a well documented limitation. It's not a reason to avoid them, it's just something to keep in mind when developing. That's like arguing against using VARCHAR fields because they can hold "only" 255 bytes of data. That's what they were designed to do.

Then he says:

You cannot have a unique sequence for multiple tables. For example, you cannot use AUTO_INCREMENT to guarantee uniqueness for columns in separate tables so that a unique value in one table does not appear in the other table as well.

But he fails to mention how rarely that is a problem. I'd like to see the database design that relies on having unique numbers across tables. If you do need that, then take advantage of some other method. But that's rather uncommon.

Finally, he says "You cannot easily determine from an application what values MySQL has automatically generated." Why would you need to do that? If you code the application properly, that's not going to be an issue. Remember, this is an AUTO_INCREMENT column. It's supposed to be generated AUTOmatically. If your code is generating the values, don't tell MySQL that it should be an AUTO_INCREMENT column. It's as simple as that.

Posted by jzawodn at July 12, 2002 01:16 AM | edit

Reader Comments

# Frank Feingold said:

On #4 "best practice" I'd have to agree with Mr. Reese "in general". If I were designing a database that needed to have information about binary objects (images for example)I would store the path to the image in the database, not the actual image. Most applications would need the path anyway (i.e. to render a web page) not the actual image file.

On #6 "best prictice" "You cannot easily determine from an application what value MySQL has automatically generated". Umm, that's what the last_insert_id() function is for. It's a common (and good practice, I believe) to have a parent table have a AUTO_INCREMENT value. You can then use the last_insert_id() function to get that value for any related child rows.

on July 12, 2002 07:45 AM

# anders said:

unfortunately, the 'skip-networking' option makes it impossible to ever use JDBC to connect (JDBC uses TCP ports only) so java development becomes out of the question. also, a more general 'best practice' is to have the database run on an entirely different machine than the application server. using 'skip-networking' makes that pretty much impossible too. in the specific case of a single-machine LAMP setup, you're probably right but i think his article was intended for a slightly broader audience.

on July 14, 2002 05:33 PM

# rudy said:

i don't know this george reese fella but i'll give him the benefit of the doubt and assume he knows a little bit about what he wrote

you and tony bowden ( http://www.tmtm.com/insanity/2002/07/14.html ) have done admirable jobs at deflating a couple of george's balloons

the "stick to ansi sql" advice is a joke, but at least george is realistic, unlike the spew-coffee-all-over-monitor "don't use stored procedures and don't use joins, either" advice in builder.com.com's article "Database design for platform independence" ( http://builder.com.com/article.jhtml?id=u00320020405gcn01.htm )which is actually quite hilarious in places...

but hey, this is the internet, and more than anything else, we have learned not to trust everything we read...

on July 16, 2002 04:44 PM

Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.