Fun with Network Programming, race conditions, and recv() flags (by Jeremy Zawodny)

Last week I had the opportunity to do a bit of protocol hacking and found myself stymied by what seemed like a race condition. As with most race conditions, it didn't happen often--anywhere from 1 in 300 to 1 in 5,000 runs. But it did happen and I couldn't really ignore it.

So I did what I often do when faced with code that's doing seemingly odd things: insert lots of debugging (otherwise known as "print statements"). Since I didn't know if the bug was in the client (Perl) or server (C++), I had to instrument both of them. I'd changed both of them a bit, so they were equally likely in my mind.

Well, to make a long, boring, and potentially embarrassing story sort, I soon figured out that the server was not at fault. The changes I made to the client were the real problem.

I had forgotten about how the recv() system call really works. I had code that looked something like this (in Perl):

recv($socket, $buffer, $length, 0);
...
if (length($buffer) != $length) {
    # complain here
}

The value of $length was provided by the server as part of its response. So the idea was that the client would read exactly $length bytes and then move on. If it read fewer, we'd be stuck checking again for more data. And if we did something like this:

while (my $chunk = <$socket>) {
    $buffer .= $chunk;
}

There's a good chance it could block forever and end up in a sort of deadlock, each waiting for the other to do something. The sever would be waiting for the next request and the client would be waiting for the sever to be "done."

Unfortunately for me, the default behavior of recv() is not to block. That means the code can't get stuck there--it simply does a best effort read. If you ask for 2048 bytes to be ready but only 1536 are currently available, you'll end up with 1536 bytes. And that's exactly the sort of thing that'd happen every once in a while.

The MSG_WAITALL flag turned out to be the solution. You can probably guess what it does...

This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned.

That's pretty much exactly what I wanted in this situation. I'm willing to handle the signal, disconnect, and error cases. Once I made that change, the client and server never missed a beat. All the weird debugging code and attempts to "detect and fix" the problem were promptly ripped out and the code started to look correct again.

The moral of this story is that you should never assume that the default behavior is what you want. Check those flags.

Now don't get me started about quoting and database queries...

Posted by jzawodn at August 07, 2008 08:42 PM | edit

Reader Comments

# Paulo DAmaro said:

Dear Jeremy

My name is Paulo and I write for a Brazilian aviation magazine called Jet magazine. I am preparing a story on aviation museums in New York City surroundings and I can't find a good hi resolution foto of "National Soaring Museum". I wonder if you have one or if you can tell me where to find.

Sinecerely

Paulo
(Sao Paulo, Brazil)

on August 8, 2008 02:31 PM

# Alan said:

I'm going to go out on a limb and say that it's bad planning to base your acceptance on the size of the data received for most things.

Last time I was writing client and server (mostly receiving server, the client was just created for testing purposes) I ended up with data receiving into a buffer that was then checked for complete messages, with completion indicated by any of a) a termination string, b) the beginning of another message, c) timeout, or d) closing of the socket.

I believe I used blocking sockets with a timeout; this was in VB.NET (1.0 or 1.1) but from a quick glance at the recv() documentation I'm pretty sure I was using the equivalent of select().

on August 10, 2008 08:13 PM

# Anil said:

Always good to get a sanity-check reminder on this stuff. If you're ever stuck on Sphinx-with-Perl stuff, though, drop us a line -- our Services team is deploying that combo all over the place and has probably bumped into every possible annoyance there. :)

on August 11, 2008 03:22 PM

Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.