Sunday, May 3, 2009

You knew I'd say something, Part IV

I thought I could squeeze the rest in this post, but I think I need one more after this.

van Rossum suggests that tail recursion may be difficult for the compiler if function redefinition is permitted. (Of course we both agree that doing this is not good style. Furthermore, we both agree that it nevertheless ought to ‘work’ in an intuitive way.) He gives this example:
def f(x):
     if x > 0:
            return f(x-1)
     return 0

g = f
def f(x):
      return x
and explains
The call to g(5) invokes the earlier f, but the “recursive” call no longer recurses! At run-time, the name ‘f’ is rebound to the later non-recursive definition, so the returned value is 4, not 0.
The direct translation into Scheme is this:
(define f
  (lambda (x)
    (if (> x 0)
        (f (- x 1))

;; Prove it is tail recursive.
(f 1000000) => 0

;; Mutate the definition of F.
(define g f)
(set! f (lambda (x) x))

;; Now what happens?
(g 5) => 4
This will work in any version of Scheme. The compiler doesn't need to analyze the target of the function call, it only needs to notice that the caller's stack frame is no longer in use.

What about debugging?

The most common objection to tail recursion is that it makes debugging more difficult because the stack backtrace is missing the frames from the tail call positions. This is a real, pragmatic concern that I think is worth addressing.

MIT-Scheme has an approach that seems promising. At each evaluation step the interpreter records the program state in a structure called the ‘history’. The history essentially contains a copy of the stack frame. (Wait!!! I can hear the objections already and I haven't even posted this!) Obviously, this would have the exact same effect on the space utilization as the stack would in a non-tail-recursive implementation except for one very important thing: the history is a ring buffer with a limited size. When a tail recursive call is recorded in the history, each frame in the history moves down by one. If a sufficient number of calls are made, the oldest record is deleted.

Recording the stack frame in the history does change the space complexity, but in a more benign way than simply foregoing tail recursion altogether. The asymptotic space utilization is changed by a constant factor. Since the oldest frames are dropped from the history, the total storage consumed by the history cannot grow without bound. This is sufficient to avoid changing the asymptotic space complexity class. (Actually, I haven't seen a formal proof of this, so there may be some corner cases. Bear with me for a moment.)

In MIT-Scheme, the history is implemented as a ring buffer of ring buffers. Imagine that you have discovered the rib cage of an animal that was shaped like an inner tube. When a non-tail recursive call is made, we rotate the entire inner tube to get to the next rib. When the non-tail call returns, we rotate it back. When a tail call is made, we rotate the current rib and record the stack frame. The effect is this: if we have, say, 5 ribs, then the top five stack frames will have a backtrace of the tail calls that occurred at that level. If each rib is, say, 10 elements long, then each of the top five stack frames will show up to 10 tail calls.

This sounds much more complicated than it really is. The end result is that when you have an error and enter the debugger, you can get pretty much most of the expected stack trace.

But what if that's not enough?

The size of the history can be controlled by the user. If you need a bigger stack trace, just make the history deeper or wider.

But what about the overhead and the extra complexity?

As it turns out, the description sounds worse than reality. If you implement the ’backbone‘ of the history as a doubly-linked list, then rotating the history one way or the other simply involves dereferencing a pointer. And if you keep each ‘rib’ of the history as a doubly linked list as well, each tail call simply involves dereferencing a pointer as well.

I think a simple demonstration might be in order here. This is a buggy version of the Ackermann function:
(define (ack m n)
  (cond ((= m 0) (/ 1 0))  ; deliberate error
        ((= n 0) (ack (- m 1) 1))
        (else (ack (- m 1) (ack m (- n 1))))))
And here is what happen when we try to run it

(ack 2 2)

  Division by zero signalled by /.

 S0  (ack m (- n 1))
    R0  (ack (- m 1) (ack m (- n 1)))
    R1  (if (= n 0) (ack (- m 1) 1) (ack (- m 1) (ack m (- n 1))))
    R2  (cond ((= m 0) (/ 1 0)) ((= n 0) (ack (- m 1) 1)) (else (ack (- m 1 ...
    R3  (ack (- m 1) 1)
    R4  (if (= n 0) (ack (- m 1) 1) (ack (- m 1) (ack m (- n 1))))
 S1  (ack m (- n 1))
    R0  (ack (- m 1) (ack m (- n 1)))
    R1  (if (= n 0) (ack (- m 1) 1) (ack (- m 1) (ack m (- n 1))))
    R2  (cond ((= m 0) (/ 1 0)) ((= n 0) (ack (- m 1) 1)) (else (ack (- m 1 ...
    R3  (ack m (- n 1))
 S2  (ack m (- n 1))
    R0  (ack (- m 1) (ack m (- n 1)))
    R1  (if (= n 0) (ack (- m 1) 1) (ack (- m 1) (ack m (- n 1))))
    R2  (cond ((= m 0) (/ 1 0)) ((= n 0) (ack (- m 1) 1)) (else (ack (- m 1 ...
    R3  (ack 2 2)
The frames labeled S0, S1, and S2 are ‘subproblem’ frames that result from the non-tail-recursive calls. The frames labeled R0, R1, etc. are the ‘reduction’ frames that result from the tail recursive calls. In this case, the history contains everything that a stack trace would contain. Again, if this is not sufficient for debugging, both the depth and width of the history are under user control.

Keeping a history structure may or may not be the best way to deal with the loss of debugging information that comes from having proper tail recursion. This feature has been in MIT Scheme since the early 80s (possibly the late 70s). It isn't very difficult to implement. My description of it is probably larger than the code itself.

I think this is a reasonable counter to the problem of loss of debugging information.


daniel said...

You had me at "Imagine that you have discovered the rib cage of an animal that was shaped like an inner tube."

This is a good real-life solution to a real-life problem. It might falls into the gap between language purists (who don't like engineering solutions like ring buffers) and language implementors (who don't like the fact that the problem exists). That gap is where Guido van Rossum lives, so it's not surprising some of this stuff is coming up.

Keep up the good work--this series is teaching a lot, thanks.

Faré said...

As I unoriginally argued, debugging with or without this trick is much EASIER with tail-calls than with loops.

Is there in Python any support for capturing the loop environment and remembering its values in previous iterations? Can you trivially turn a "normal" loop into an "instrumented" loop?

Paul Prescod said...

This series is very interesting and informative. But I still feel that the last post is incomplete without code examples in TCO versus trampoline style. Can you comment on that please?

Paul Prescod said...

BTW: I like this compromise on the debugging problem, but that doesn't mean that I think that TCO is worth the complexity.

Fare: one could write a loop recording program with Python's profiler API but I don't think that any such thing ships with Python.

Presumably most people just instrument their loops "by hand". They could do the same with TCO, but they'd need to be more careful not to accidentally change the space-complexity of their program by putting their debugging instructions after or around the "tail".

Josh said...

Excellent solution. I suggested a history structure in response to one of GvR's posts, but using a circular buffer did not occur to me.

He seems dead set on having a "complete" stack trace, even though it doesn't help at all if you have to use a loop and a case statement to avoid blowin g the stack. 1000 recursive calls ought to be enough for anyone. ;-)

About your previous post...

Is there any chance that you could elaborate on implementing the actor model? I'd love to see a straightforward implemenation in a lisp dialect.

Unknown said...

This is the part of the argument against TCO that has really confused me: "We don't need tail recursion, we have loops, and anyways TCO makes things hard to debug since we lose stacktraces."

I can't count the number of times I've been saved by stacktraces of my loops :\

David Bakin said...

I know it works (and tried it at to confirm0, but I don't understand:

"The compiler doesn't need to analyze the target of the function call, it only needs to notice that the caller's stack frame is no longer in use."

What exactly is going on and and what time? That is, does it "notice" this when "(set! f ...)" is compiled? Or executed? Or when "(g 5)" is compiled? Executed? Or when the inner "f" (in tail recursive position) is executed? Is the effect of "noticing" that the function "g" is bound to gets recompiled?

I'm confused...

Joe Marshall said...

When the compiler compiles any call to f (or the interpreter interprets a call to f), if it is in tail position, it compiles it as a tail call and discards the current stack frame before transferring control to f. It is a fairly common misconception that the compiler needs to have some special knowledge about the target of the call (i.e. is f itself a tail recursive loop?) in order to compile the call to f. This isn't the case as one can dynamically replace what f without affecting whether calls to f are tail recursive or not.

David Bakin said...

Oh, duh. I had in mind (for some unaccountable reason) compiling a direct JMP to the address of the first instruction of f. Obviously it compiles an indirect JMP through f's location (the binding). Thanks!