This is something that's been bugging me for a long time now. Over the years, I've come to realize that programming time is 10% about writing the code to do the work, 70% about figuring out where failures might occur and dealing with them, 10% about documentation, and 10% about documentation. (That last 10% may be substituted with Desktop Tower Defense or something equally time wasting.)

Or something like that. The point is that writing the code to do what I want isn't hard. It's dealing with all the other things that do--especially error conditions. There are so many weird corner cases to consider. And when you're working on code for a high volume web site that has its servers under load 24 hours a day, it doesn't take long to encounter those odd situations.

Murphy is always watching.

Years ago, after battling similar problems at Yahoo, I began to develop certain ideas about how errors should be detected, handled, and reported. An important idea here is that the developer should always be in control of when the script/program/process dies. Aside from something truly fatal (like a segfault) library routines should detect errors and report them back to their caller in the form of a known-to-be-bad return value.

The problem is that I keep running into code I want to use that breaks that rule in multiple places. In Perl terms, that means that I'll be happily testing my code and suddenly something goes wrong and my script dies in a place I didn't expect. Upon digging into it, I find that the CPAN library I'm using has something like this lurking in it:

if (not $good) {
    Carp::croak("bad stuff happened!");


if (not $good) {
    die "badness here!";


This means I have to read the code a bit more and see if I can discern why the developer wants my script to die in some cases, but in others he's content to just do this:

if (not $good) {
    $@ = "bad things happened";
    return undef;

What is it about some errors that makes them fatal while others aren't so bad that I'm deemed able to deal with them? Why has this developer taken that decision away from me? It makes no sense at all.

What this means is that I then need to litter my code with ugly crap like this:

eval {
if ($@) {
    # handle error here

The problem with that, aside from the fact that I'm dealing with another developer's inconsistent coding, is that it pollutes my code and forces me to make yet another frustrating decision.

Do I use a small number of big eval blocks and give up knowing exactly where the code died? Or do I pollute my code with a larger number of smaller eval blocks so that I can react to specific problems with a more specific solution? That means the module developer would have had to document which methods or functions may die on me. Otherwise I have to go trudging through their code and waste my time figuring that out. Guess which is more frequent.

Or do I override the module's use of die or Carp or whatever. I can do it, but that has other side effects I probably don't want to deal with either.

Why do I even need to deal with this in the first place? Can't people provide consistent interfaces? Is there something so bad about returning an error code and leaving it up to the user of your code to decide how to handle error conditions?

Maybe they do want to exit() or die(). Maybe they want to retry the logic after waiting a bit. Maybe they want to page someone and log the failure. Maybe...

You get the idea.

This whole concept of "fatal" exceptions seems wrong to me. Unless things are so bad that the kernel is going to kill my process, I should be the one in charge of deciding when my code will blow up. And I shouldn't have to do extra work to asset that authority. Should I?

I know that in the Java world, it's common to do a bunch of stuff in a big try block and then try to figure out what, if anything, blew up later. But I'm a firm believer in dealing with specific problems at the exact place they occur.

I really wish more people thought that way. It'd make my life easier.

Posted by jzawodn at October 01, 2008 07:45 AM

Reader Comments
# NM said:

I'm not sure why you're ranting about this; die'ing is the standard way of implementing "throw" in Perl. And just like in C++, if you don't properly catch an exception, your program will stop. Recent Java just won't let you compile a program that doesn't handle exceptions it's supposed to catch.

"But I'm a firm believer in dealing with specific problems at the exact place they occur."

Most people nowadays tend to think that exceptions are a good thing, but hey ...

on October 1, 2008 08:13 AM
# Jeremy Zawodny said:

Here's the deal. I'm more than willing to "catch" the exception, but really don't want to litter my code or have to worry about which "exceptions" are fatal and which are not. If a problem occurs, I have to deal with it no matter. Why make me do extra work just because?

on October 1, 2008 08:19 AM
# Robert said:

Agreed. PHP libraries seem to love the idea of throw die() in there all the time for no good reason. Even a return false would be far better.

Some libraries seem to be pretty good (return false, or have an error property you can check for an error message etc.). That's nice since I can simply do:



Easy as cake. That's my favorite since it's easy to work with.

on October 1, 2008 10:03 AM
# Derek said:

"die" is how you throw a *fatal* error. An error that there's absolutely no way anyone could possibly believe they can recover from it. Otherwise, the module's methods should have a means of returning back error statuses as normal.

I just went through this with Net::SSH::Expect as we automatically collected switch configs to back them up. It would freak out on the input it received and instead of allowing my script to do something crazy and out of this world like, I dunno, *retry that host*, it would simply crap out the entire script.

Return undef to the method to prompt me to go look for an error variable, or any of the myriad other ways lots of well-written modules handle this. But the module shouldn't assume that just because IT can't figure out how to react, that the parent script can't figure it out.

My rule of thumb has always been "modules shouldn't die except for truly fatal conditions" (the kind you mentioned above), and the parent script should be the only portion allowed to do so.

on October 1, 2008 10:34 AM
# Selvin George said:

I agree with most of the points you've made here. I've noted in the past that, in some cases, I received specific instructions from managers on how errors should be thrown or handled. But most of the time, it is due to lack of time that decisions are made by library writers on other's behalf.

As software engineers we should all be more generous to each other and make fewer decisions on behalf of our fellow programmers. Free them from the vagaries of your code! :)

Following up on the following sentence
"I began to develop certain ideas about how errors should be detected, handled, and reported"

Did you formalize these conditions and methods for error-handling? It would be really helpful if you're able to share these.


on October 1, 2008 10:49 AM
# anders pearson said:

For languages like Perl and Java, I tend to agree with your argument.

It's interesting though to look at what happens when you go completely the other direction, as Erlang/OTP does with the "fail fast" approach to error handling. Basically, the recommended style for Erlang development is that if a process encounters anything unusual it should die as quickly and loudly as possible. You write your code basically always expecting that child processes will die at any moment. Erlang/OTP includes a number of design features (cheap processes, process linking, supervisor trees, good message passing, single assignment variable semantics, etc.) that make this actually a very smooth way of programming, and it's one of the core ideas that's enabled them to build some of the highest availability applications running.

I'm intrigued by Erlang's approach because I think it might be a fundamentally better approach for programming in a networked environment. If you're relying on any kind of external service, you have to accept that it could become unavailable to you at any time, which is functionally equivalent to having a library routine exit on you. If you're used to libraries behaving "properly", then working with network calls requires special attention and care and more exception handling code. With something like Erlang, network operations aren't exceptional at all and don't really require any more or less care than any other code. The problem of course, is that Perl and Java haven't been designed that way from the ground up, so the fail fast approach is pretty cumbersome and inefficient in those kinds of languages.

Still, if your current thinking on exception handling is "libraries should never exit", you might want to read Joe Armstrong's PhD thesis on "Making reliable systems in the presence of software errors" ( which lays out a very good argument for the polar opposite approach. Even if it isn't very applicable to Java/Perl/PHP type languages, it should be a mind expanding read.

on October 1, 2008 11:12 AM
# Joe Zawodny said:

Welcome to the world of open source. My code is free for you to use and modify as you wish. I wrote it specifically to do what I wanted it to do (all the other open source code/crap out there didn't conform to my mental model of how things should work). While you will probably find that my code does not do everything you want it to do or do so in a way that is best for you, you will still be tempted to use it rather than writing your own from scratch. Ultimately, you will end up spending more time modifying my code to do things 'correctly', then it would have taken you to just write the darned thing from scratch. Despite all of your hard work on my code, I still get to take the majority of the credit for writing it (see license).

This is exactly why open source code almost doesn't suck.

on October 1, 2008 12:44 PM
# Phil Harnish said:

When programming I expect virtually everything to be consumable, or what you call "a library". Even controllers could be extended, repurposed, and reused. Isn't inheritance and composition what makes that first 10% *only* 10%? So my question is:
When can you truly "handle an error"?

Sure, we could come up with indirection and/or libraries to solve the language/paradigm problem in C/Java/Perl/etc. If you ask me the whole thing smells. Working with erlang was eye-opening and I'm definitely eager to see what develops.

on October 1, 2008 11:03 PM
# Gabriel said:

Yup. Good one... I had to dig into some code also on this one... and ended up changing his code... thus breaking the svn:external... a really big mess...

on October 2, 2008 03:35 AM
# Doug said:

I adhere to Bertrand Meyer's "Design by Contract" whenever possible. The module is a "provider" offering service to a "client" (calling routine). The hitch is, the provider only has to provide predictable service (postconditions) when the terms of the contract (preconditions) are met by the client. If the terms of the contract are not met by the client, the provider is free to do _anything it likes_. But the best thing for it to do is to throw an exception.

The problem is not at all that library code is throwing exceptions, but to understand what it means when that happens.

There are just a few cases where die should be used:

1. The client fails to satisfy the preconditions.

2. The provider is _unable_ to ensure the postconditions, even though the preconditions are met.

Reliable software depends on agreement between the provider and client on just what the contract is. In practice that means the module writer documents and the module user reads and follows that documentation.

Modern programming languages (including perl) have ways to propagate exceptions (and sufficient execution context at the failure), so that the exception can be interpreted and handled at any appropriate level in the code, if possible.

Returning error codes is a common device used in lieu of exceptions, but tends to complicate what both the provider and client are trying to do. Add to that the fact that programmers are Extremely unreliable at checking error codes, and it's obvious there has to be a better way.

on October 2, 2008 05:46 AM
# NM said:

""die" is how you throw a *fatal* error."

Nope, sorry. "die" is how you generate an exception in Perl; which unless unhandled (by eval) results in the program terminating. Don't believe me? Believe perldoc:

$ perldoc -f die

Outside an "eval", prints the value of LIST to "STDERR" and exits with the current value of $! (errno). If $! is 0, exits with the value of "($? >> 8)" (backtick ‘command‘ status). If "($? >> 8)" is 0, exits with 255. Inside an "eval()," the error message is stuffed into $@ and the "eval" is terminated with the undefined value. This makes "die" the way to raise an exception.

on October 2, 2008 05:48 AM
# Andy said:

I just commented on this matter in Reddit thread.

In short: exceptions should be used for *exceptional* things. i.e. when a method can't complete the task it is supposed to do.

on October 2, 2008 11:21 AM
# David Butler said:

Have you seen this technique using

It seems to make the code more elegant. I don't have experience with this technique, so I am not sure if there is a performance hit or other drawbacks.

on October 5, 2008 03:49 PM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.


Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.