Binary Search (by Jeremy Zawodny)

Programmers can be so damned stupid sometimes.

Take me for example.

I've been working to optimize and adjust some code at work. I can't tell you what it does but I can tell you that it's too slow and uses too much memory. It's Perl. I know Perl. I'd like to think I'd know it pretty well, having used it for around nine years now.

In tracking down this memory problem, I've learned a lot about what a memory pig Perl can be. But that's a topic for another blog entry. The reall issue is how I've been tracking the problem. I'd get a hunch that the %foo hash was way too big and causing the process to die. So I'd convert it to a tied hash backed by Berkeley DB. And I'd run it again. It would again die.

Of course, this never happens in my small and quick to test data. It only happens with the full load (between 6 and 17 million, uhm, phrases). And it takes anywhere from 35 to 60 minutes for it to die. So you can guess how productive this makes me with an average 45 minute test cycle.

Ugh.

I've finally decided to just resort to a classic debugging technique: the binary search. Well, with a twist. Thanks to Ray, I'm using Devel::Size to periodically dump the memory use (or some approximation of it--that's another story) out to a log.

Why I didn't start this a few days ago is beyond me.

No, wait. It's not. It's because every time I tried something new, followed a new hunch, I was convinced that it was the solution.

Grr.

Someone slap me next time I do this.

Posted by jzawodn at February 20, 2003 09:29 PM | edit

Reader Comments

# kalyan said:

There is an old saying that perl was developed to parse log files.. pretty sad to know that , its the same languauge that cant handle huge log files.

I ran into the same issue. Hash tables on log files are killers in perl. Not only it takes up too much memory.. I'd sometimes hit the perl 512MB size limit on my machine.

So From hash tables.. I switched to 2 arrays in sync.... it was still too slow..

now I do it in C++ with hash_map. ( hash_map in c++ also can get pretty huge )

on February 21, 2003 04:08 AM

# Harald said:

I've had that experience. Normally I'm very good at following hunches to the correct solution, but every once in a while (just often enough to hurt :-) it doesn't work, and then I'm forced into good, old fashioned, methodical techniques.

I guess we can't always be miracle workers...

on February 21, 2003 06:38 AM

# sush said:

Heh. The 512meg limit while loading a hash is something I've faced as well. Wasn't sure that was a perl limit, I think it was more of a per process kernel mem limit. The data we were loading was roughly 110 megs of umm.. words. ;) Wound up hotwiring change in code from using a hash to an array of hashes on a late Sunday night so that the process could continue.

What irritated me the most was the absence of a way to tell perl that I don't want to use, say, more than 40 chars for the key and 20 chars for the value. (Well, there was supposed to be one, and it didn't quite work) A hint mechanism kind of thing would have solved a lot of things quite easily.

I suppose a tied hash was a possibility, but how about speed then?

on February 21, 2003 08:33 AM

# Dan said:

What, then, besides some stuff I have no control over (like memory that libc's allocated but perl doesn't know about), does Devel::Size need to handle better? (Well, OK, other than code refs...)

on February 21, 2003 02:49 PM

# Adam Keys said:

This summer I was working on code running on a gate-level simulator. Said simulator was running a model of a pretty large processor; simulating a program that takes 1 second to run took about an hour. So I learned a lot about getting the most out of each run.

1) Don't chase hunches. So,
2) Think about what might *really* be going on and then put in enough probe points that you can see if you're right.
3) Once you have a hunch of what is going on, think it through twice and then implement it.
4) Write down what you did! After an hour of talking with co-workers, writing emails and reading news, you've probably forgotten everything you tried in the first place.

This is probably very similar to what programmers had to do back in the time sharing days. Its pain and I wouldn't wish it on anyone.

on February 21, 2003 08:15 PM

# hadi said:

alef alef

on April 17, 2003 03:44 AM

Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.