Friday, May 1, 2009

Interlude

Part III is coming, but I wanted to make an observation. As I've been writing stuff, I've been poring over a lot of code looking for examples and counter-examples of ideas I want to talk about. Since I've been talking about tail recursion, I've been looking in particular at procedures that return answers that are computed by other procedures. It's fairly easy to find these things — you just look for a return statement immediately followed by a function call: return<whitespace+><identifier><whitespace*> (. This doesn't match perfectly, but it gets more than enough results to look at. In looking at several examples of code I've noticed a common pattern like this:
    ...
    Type foo = computeFoo (a, b);
    return foo;
}
This isn't a tail recursive call, but it is trivially transformed into one. Then I was thinking that I wouldn't write code like that. (At least I don't think I would.) It's like writing this in Scheme:
    (let ((foo (compute-foo a b)))
       foo)
Which is simply syntactic sugar for this:
    ((lambda (foo) foo) (compute-foo a b))
Which trivially beta-reduces. That first term is better known as the identity function, so I could have just written:
    (id (compute-foo a b))
instead of the let expression, but why would I even bother? So this code:
    ...
    Type foo = computeFoo (a, b);
    return foo;
}
is really a verbose way of writing this:
    ...
    return Identity<Type> (computeFoo (a, b));
}
which no sane person would prefer to
    ...
    return computeFoo (a, b);
}
Yet there are apparently reasonably sane people who seem to prefer the first version. (caveat here: I'm assuming that the Identity function is being called for value and that this is not an obscure means of forcing a type coercion. In all the code I've looked at, I haven't seen anything that nasty, but it wouldn't surprise me. I'm not talking about obscure compiler tricks.)

7 comments:

Alex Meyer said...

I often do it that way for ease of debugging. That way I can set a breakpoint at the return statement and look at the value of foo. If done the inline way I have no easy way to step over the function call and watch the value before it is returned. I trust the compiler to be able to optimize it away anyway.

Paul Steckler said...

I sometimes have this structure in my code as the aftermath of having inserted a print statement between the variable binding and the return. After the print statement is removed, you have the pattern shown in Joe's post.

Jason Riedy said...

I'm echoing the previous two statements. Giving a name to the value should lodge enough information in the debugging information to let debuggers, tracers, etc. present their data a little more clearly. And sometimes I use such an assignment as documentation. I wouldn't use a name like foo but rather final_bfs_depth (or preferably something more about the purpose). That can add a little documentation about intent while also making the symbol available for tab-completion and other gizmos.

So the assignment isn't for communication with the CPU, it's for communication with tools and with people.

Reid K said...

Yeah, it reminds me of something that Gerry Sussman said in favor of obscure higher order functions. He had some code with a couple of random functions like "compose-and-curry-second-arg" or something really random like that, which would have been shorter to write with a lambda. He said he prefers the version with the named function because names are easier to read and debug. I think it was something along the lines of "naming something gives you power over it".

However, it's a stretch to apply it to what you're talking about :).

chris said...

In my job I review and debug a lot of other people's C code, and the use of superfluous temporary variables is a huge pet peeve of mine. IMO, superfluous temps, more often than not, serve only to obscure the code.

If I need to debug something, then I can insert the necessary temps. But sticking temps everywhere in case I might need to debug something? That sounds insane to me.

As a simple example of the obscuring power of superfluous temps, here's a reduced version of something that I recently ran across in production code. It made me crazy:

int tmp, tmp2;

tmp = getValA();
tmp2 = getValB();
printf("A is %d, B is %d\n", tmp, tmp2);
tmp = getValC();
tmp2 = getValD();
printf("C is %d, D is %d\n", tmp, tmp2);
tmp = getValE();
tmp2 = getValF();
printf("E is %d, F is %d\n", tmp, tmp2);
tmp = getValG();
tmp2 = getValH();
printf("G is %d, H is %d\n", tmp, tmp2);

This went on for something like 20 separate printfs. (There were no As, Bs, Cs, and so on; the actual values being retrieved were various hardware registers. To make matters worse, this code was part of a crashdump-generating sequence where no one was about to attach a debugger anyway.)

I acknowledge the potential documentary power of a well-named variable, but my experience suggests that giving names to things for immediate use (and discard) is most often an exercise in futility.

Josh said...

Chris,

If you think that's bad, you should see some of the legacy code that I inherited. It's about 6k lines of C and most of the variables are named i, i1, i2, i3, i4, etc. :-/

The worst part is that the poorly named variables are only a minor annoyance compared with everything else that's wrong with the program.

Chris D said...

I'm with the first 3 commenters: it's typically a leftover construct from development, though if it's a gnarly function call, I do prefer to have it on its own line. There's plenty of room for disagreement, but I don't think there's a hard and fast rule to be applied in languages that aren't tail-recursive.

Having matured in the era of servers with many megabytes of RAM, the extra couple bytes of the temp variable just hasn't been a concern. =)