Tuesday, January 21, 2020

Afraid of Tail Recursion

It's well known fact among proponents of tail recursion that some people just don't get it. They view tail recursion at best as a quirky compiler optimization that turns some recursive calls into loops. At worst, they see it as some sort of voodoo, or a debugging pessimization. They see little value in it. Some have outright disdain for it.

Tail recursion isn't just about turning recursive calls into loops. It's about changing how you look at function calling. Tail recursion just happens to fall out of this new viewpoint.

Most programmers, I think, view function calls as if they were akin to a short vacation. You pack up the arguments in your luggage, travel to the destination, unpack your luggage, do some stuff, repack your luggage with some souvenirs, return home, unpack everything and resume life where you left off. Your souvenirs are the return value.

Should you need a vacation from your vacation, you do the same thing: pack up the arguments in your luggage, travel to your new destination, unpack your luggage, do some stuff, repack your luggage with some souvenirs, return to your original vacation spot and resume your original vacation.

Tail recursion aficionados realize that the journey itself is the important part of the function call, and that a vacation includes two journeys. On the first journey you pack up the arguments, including the return ticket, in your luggage, use the outbound ticket to journey to the destination, unpack your luggage, and start doing stuff. When you run out of stuff to do, you make the second journey. You fetch the return ticket, repack your luggage, take the ticket to wherever it leads (presumably back home), unpack everything, and resume doing whatever you were doing there.

But perhaps you want to visit grandma instead of going directly home. Then we change the script slightly. When you run out of things to do on your vacation, you pack up your luggage with your souvenirs and the return ticket, then you journey to grandma's house, where you unpack and start doing stuff. Eventually you are done visiting grandma, so then you fetch the return ticket, repack your luggage, take the ticket to wherever it leads, unpack everything, and resume doing stuff there. It's a three-legged journey. You don't go from grandma's back to the vacation resort — there's nothing left for you to do there. You take the return ticket directly home.

Viewing things this way, a function call involves packaging the arguments in a suitable way, deallocating any temporary storage, and then making an unconditional transfer to the function, where we unpack the arguments and resume execution of the program. It is simply “a goto that passes arguments”.*

A function return is simply “a goto that passes a return value”. It involves packaging the return value in a suitable way, deallocating any temporary storage, and then making an unconditional transfer to the return address, where we resume execution of the program.

A tail recursive function call is simply “a goto that passes arguments”. It involves packaging the arguments in a suitable way, deallocating any temporary storage and then making an unconditional transfer to the function, where we resume execution of the program.

Do we really deallocate temporary storage before every control transfer? Certainly a return pops the topmost stack frame, and as often implemented, a tail recursive function call deallocates its stack frame or replaces it before transferring control, but a non tail recursive call? It does so as well, it's just that it also has to pack those values into a new continuation for the return trip. We use an implementation trick to avoid the absurdity of actually moving these values around: we move the base of frame pointer instead. Voila, we simultaneously deallocate the stack frame and allocate the continuation with the right values already in place.

Deallocating storage before each control transfer is an important part of the protocol. We're making a unconditional transfer to a destination with the hope, but no guarantee, that we'll come back, so we'd better tidy up before we leave. This ensures that we won't leave a trail of garbage behind us on each transfer which would accumulate and adversely affect the space complexity of our program.

Once you view a function call and return as not being a single sequence, but each one a separate, and virtually identical sequence, then tail recursion becomes a natural consequence. Tail recursion isn't a special case of function call, it is the same thing as a function call, the only difference being whether a new continuation (the "return ticket") is allocated in order to come back. Even function returns are the same thing, the only difference being that destination is (usually) determined dynamically rather than statically. Tail recursion isn't just another optimization, it's the result of treating inter-procedural control transfer symmetrically.

Another natural consequence is greatly increased options for control flow. Instead of a strict call/return pattern, you can make "three-legged" trips, or however many legs you need. You can make loops that incorporate one, two, or even a dynamically changing number of functions. You can replace compiler-generated returns with user-provided function calls (continuation-passing style) and implement arbitrarily complex control and data flow like multiple return values, exception handling, backtracking, coroutines, and patterns that don't even have names yet. And of course you can mix and match these patterns with the standard call and return pattern as well.

The phrase "tail recursion" is shorthand for this symmetric view of interprocedural control flow and is meant to encompass all these consequences and control flow options that spring from it. It's not about simply turning recursive functions into loops.

People who are afraid of tail recursion seem unable to see any value in taking up the symmetric viewpoint despite the fact that it opens up a whole new set of control flow techniques (in particular continuation-passing style). They find the notion that a procedure call is “a goto that passes arguments” “nonsensical”. A lot of good research has come from taking this nonsense seriously.

*The view that a function call is simply a “a goto that passes arguments” was developed by Steele in his “Lambda papers”.

The important point of cleaning up before the control transfer was formalized by Clinger in “Proper Tail Recursion and Space Efficiency”.

Someone — it might have been Clinger, but I cannot find a reference — called tail recursion “garbage collection for the stack”. The stack, being so much more limited in size than the heap, needs it that much more. Indeed Clinger notes the tight connection between tail recursion and heap garbage collection and points out that heap garbage collection is hampered if the stack is retaining pointers to logically dead data structures. If the dead structures are large enough, garbage collection can be rendered useless. Yet many popular languages provide garbage collection but not tail recursion.

The only difference between a call and return is that typically the call is to a statically known location and the return address is dynamically passed as a "hidden" argument. But some compilers, like Siskind's Stalin compiler, statically determine the return address as well.

The only difference between a function call and a tail recursive function call is when you need to return to the caller to complete some work. In this case, the caller needs to allocate a new continuation so that control is eventually returned. If there is no further work to be done in the caller, it doesn't create a new continuation, but simply passes along the one that was passed to it.

Many compilers have been written that handle function calls, tail recursive function calls, and returns identically. They only change what code they emit for handling the continuation allocation. These compilers naturally produce tail recursive code.

Most machines provide special purpose support for a LIFO stack. It is tempting to use the stack for allocation of continuations because they are almost always allocated and deallocated in LIFO order, and a stack gives higher performance when this is the case. Many compilers do in fact use the stack for continuations and argument passing. Some, like Winklemann's Chicken compiler follow Baker's suggestion and treat the stack as an "nursery" for the heap. Others avoid using the stack altogether because of the extra complexity it entails. And others cannot use the stack because of constraints placed on stack usage by OS conventions or the underlying virtual machine model.


Manuel Simoni said...

I think TCE is cool in principle, but I am simply not willing to give up proper stack traces. I can immediately see through 80% of runtime errors by looking at a stack trace, so stack traces are of utmost value to me.

The argument that loops don't give you a stack trace either is unconvincing. In practice, not having a trace for loops has never been an issue for me.

I am aware that there are implementation techniques that give you both TCE and proper stack traces (at least for a subset of the stack). I would need to see more discussion and experience reports of those techniques to be convinced.

A smaller, but nevertheless real, issue is the performance hit from TCE on really existing architectures. I have implemented a couple of Lisp interpreters, and the ones with TCE are about 10x slower in various approaches I have tried.

Joe Marshall said...

Baily and Weston have had the opposite experience. They report up to 7x improvement in performance in Performance Benefits of Tail Recursion Removal in Procedural Languages. Perhaps they used a different approach.

By most accounts, trampolining is about the slowest way to get proper tail recursion, and it usually causes only about a factor of 2 slowdown.

Dave Cooper said...
This comment has been removed by the author.
Dave Cooper said...

If you need stack traces for debugging, in most Common Lisp impls I’ve tried it on, you can get the compiler to not do TCE by loading, rather than compile-and-loading, the code in question.

Anonymous said...

Another aspect of not having TCE is that by manually handling recursive computation using an explicit stack make is easy to searialise the "continuation", which allows checkpointing in a distributed system.
I work for a $BIGCORP and all our data pipelines are done in a dataflow style, under the assumption that there are maybe thousands of compute nodes but they are very low priority and can be evicted at any moment.
In practice, even though I've tried Scheme & continuations I can't think of any case where I found those new control flow techniques superior to what we have from the point of view of my ability to reason about the code and debug it.
I hope one day somebody will come up with a runtime that will offer debugging and profiling tools adequate to these new control flow techniques.

Joe Marshall said...

Manually handling recursive computation with an explicit stack is basically orthogonal to whether the language you express it in offers tail recursion. In fact it is slightly easier if do have tail recursion. As an example, I point to MIT/GNU Scheme. It is written in C, and manually handles recursive computation with an explicit stack for the purpose of achieving tail recursion in the interpreted Scheme. But since C itself isn't tail recursive, the core of the interpreter has to be written in one big C function that doesn't itself make any function calls. The core of the Racket interpreter is one big C function as is the core of SIOD. If C were tail recursive, then the core of the interpreter could be broken into several more abstract units.

Frode Fjeld said...

In my view, the biggest problem with TCO is that it overloads the function call syntax with something that is similar yet completely different. That makes programs less readable. If you insist on having TCO, provide specific and explicit syntax for it. This would also avoid bugs where intended TCOs aren't.

David Bakin said...

IIRC trampolining was invented so that a tail recursive language interpreter could be implemented entirely in C. The goal was to remain trapped in C, no recourse to low-level assembly or knowledge of the C compiler's generated code, yet still have a proper Scheme implementation.

I just don't understand why JavaScript compiler (equivalently, JIT) developers don't get this and still talk about trampolining as if it was necessary to them - and thus, to complain about the "overhead" of having a code generation/runtime model that supports PTC.

You point to Steele's "Lambda" papers. Yes, those papers and his Rabbit compiler showed that this issue was essentially solved in the late 70's. And, as good as this particular blog post is, and it is very good (thank you!), those papers are models of clarity. How you could read them and not get what's going on is beyond me.

David Bakin said...

IIRC "trampolining" was invented as a way to implement a tail-recursive language (Scheme, I think) inside of a total C environment without resorting to any assembly language tricks or knowledge of the generated C code from any particular compiler. Trampolining is unsuitable for any implementation that actually can generate code.

I don't understand why the JavaScript compiler and JIT developers are hung up on trampolining (or even Cheny-on-the-MTA), using its poor performance as an excuse to not implement PTC as required by ES6. They definitely fall into the camp of "do not get TC". Because:

You point to Steele's "lambda" papers. Yes. Those papers plus his Rabbit compiler prove that this problem was solved in the late 70's. As good as this blog post is - and it is good, thank you! - those papers are a model of clarity. How you could read them and not get understand the use of and need for PTC is beyond me.

[possibly posted twice, if so, excuse me!]