When Reasonable, Return Early

When you have a function or piece of code that must handle several different cases, I find it much better to eliminate special cases using¬†return¬†or throw¬†at the beginning. (“returning early”).

if (!input.isValid()) {
    throw new InvalidInputException();
}

// handle input (20 lines)

Several advantages here:

  1. It frees our mind from worrying about those cases as we read the rest of the function.¬†Programmers, as humans, have limited mental stacks, and if we’re reading a function that must handle several cases, each of those cases must be held in our memory until we see the code that handles them.
  2. It simplifies the remaining code, since it’s handling a smaller set of cases.
  3. It compartmentalizes the code into: “handle special cases, then handle the common case.”
  4. It keeps the code that handles special cases close to the checks for those cases.
  5. The primary piece of code stays at the root indent level.
  6. Diff views stay simple when adding/removing special conditions (it does not introduce big indenting changes across many lines).

In fact there’s (what I consider to be) an anti-pattern that involves checking for special cases early, but instead placing the meat of the function in the¬†if¬†block, followed by an¬†else¬†block to handle the special case:

if (input.isValid()) {

    // handle input (20 lines)

} else {
    throw new InvalidInputException();
}

Notice how the handling of the error can now become separated from the cause of the error, and the meat of the code now must be indented. This anti-pattern gets obviously worse the more conditions you must test for:

if (input1.isValid()) {
    if (input2.isValid()) {
        if (input3.isValid()) {
            if (input4.isValid()) {

                // handle input (20 lines)

            } else {
                throw new InvalidInputException('input4 is bad');
            }
        } else {
            throw new InvalidInputException('input3 is bad');
        }
    } else {
        throw new InvalidInputException('input2 is bad');
    }
} else {
    throw new InvalidInputException('input1 is bad');
}

This code is hard to read, hard to edit, and harder to review when changes occur to the conditions:

  1. The invalid handling code appears in the reverse order that the inputs are checked.
  2. The invalid handling code is very far from the conditions (you have to lean on the IDE to see where the braces line up).
  3. The meat of the function starts off 4 indents over.
  4. Adding/removing validity checks is a huge pain that involves re-indenting a bunch of code.
  5. Change diffs can look really messy as indent levels are changed.

Code with a lot of special cases/conditions probably needs to be refactored to separate validation from the processing of the input.

There are certainly situations where returning early is not an obvious win (hence “when reasonable”). This often happens when you have a return value that contains complex state, or some statements must be executed before every return/throw¬†statement.

Apache mod_cache and mod_rewrite: Danger

I solved a very strange bug today where mod_cache kept returning a particular cached file for seemingly every URL on a site.

After using LogLevel debug, I finally realized that mod_cache does not store content by the request URL, but rather by the final URL after RewriteRules have been processed.

This PHP application was using a common RewriteRule to map most requests to a single script, but the full request URL was not being copied into the rewritten URL (the script was just relying on $_SERVER['REQUEST_URI'] to route the request). The result is mod_cache considered all URLs the same, and it was only by some magic of session cache-busting headers that the app managed to function at all. Internally mod_cache was just churning that single cache file over and over.

The simple fix was to make sure the full URL path ended up in the internal URL:

RewriteRule ^(.*)$  index.php/$1 [L,QSA]