Will Clinger's paper

*Proper tail recursion and space efficiency*in the Proceedings of the 1998 ACM Conference on Programming Language Design and Implementation, June 1998, pages 174-185, is the best reference I've found on the topic.

First of all, let's be clear about what ‘proper tail recursion’ is. Clinger defines it this way:

The essence of proper tail recursion is that a procedure can return by performing a tail call to any procedure, incuding itself.It is important to recognize that we're not just talking about loops written in tail-recursive style, but

*any*procedure that (tail) calls another in order to compute the return value.

It is a common misconception that tail recursion is primarily a syntactic sleight-of-hand that allows a programmer to use a function call rather than a traditional iteration construct. A Scheme programmer might write this:

(define (factorial x) (define (fact-iter n answer) (if (> n x) answer (fact-iter (+ n 1) (* answer n)))) (fact-iter 1 1))

Where a more traditional formulation is this:

int factorial (int x) { int answer = 1; for (int n = 1; n <= x; n++) answer *= n; return answer; }

The Scheme programmer relies on the compiler to recognize that the recursive call to fact-iter is intended to be a loop. He expects that the code will compile to something very close to the traditional formulation. The MIT-Scheme compiler generates this code:

;; Note: some instructions have been omitted for clarity ;; (tagging, interrupt polling, etc.) fact-iter-3: ;; (assign (register #x10) (offset (register 4) (machine-constant 0))) (mov w (r 0) (@r 4)) ;; (assign (register #x12) (offset (register 4) (machine-constant 2))) (mov w (r 1) (@ro b 4 8)) ;; (fixnum-pred-2-args greater-than-fixnum? (register #x11) (register #x13)) (cmp w (r 0) (r 1)) (jg (@pcr label-6)) ;; (assign (register #x15) (offset (register 4) (machine-constant 1))) (mov w (r 1) (@ro b 4 4)) ;; (assign (register #x19) (fixnum-2-args multiply-fixnum (register #x16) (register #x11) #f)) (imul w (r 1) (r 0)) ;; (assign (offset (register 4) (machine-constant 1)) (register #x14)) (mov w (@ro b 4 4) (r 1)) ;; (assign (register #x1d) (fixnum-1-arg one-plus-fixnum (register #x11) #f)) (add w (r 0) (& #x5a)) ;; (assign (offset (register 4) (machine-constant 0)) (register #x1a)) (mov w (@r 4) (r 0)) ;; (invocation:jump 2 #f fact-iter-3) (jmp (@pcr fact-iter-3))

As you can see, the compiler dutifully turned it into a loop.

Not everyone is aware that this works across procedure boundaries.

```
(define (even? x)
(if (= x 0)
#t
(odd? (- x 1))))
(define (odd? x)
(even? (- x 1)))
```*Yeah, I blew it here by forgetting to test for
a base case. But the conditional wasn't the interesting part anyway.*
;; Note: some instructions have been omitted for clarity
;;
even?-2:
;; (assign (register #x10) (offset (register 4) (machine-constant 0)))
(mov w (r 0) (@r 4))
;; (eq-test (register #x10) (constant 0))
(cmp w (r 0) (&u #x68000000))
(je (@pcr label-4))
;; (lap-opt fixnum-add-const-in-place)
;; (assign (register #x14) (object->fixnum (register #x10)))
;; (assign (register #x15) (fixnum-1-arg minus-one-plus-fixnum (register #x14) #f))
;; (assign (register #x12) (fixnum->object (register #x15)))
(add w (r 0) (& #x3ffffff))
(and w (r 0) (& #x6bffffff))
;; (assign (offset (register 4) (machine-constant 0)) (register #x12))
(mov w (@r 4) (r 0))
;; (invocation:uuo-link 2 #f odd?)
(jmp odd?-1)
label-4:
;; (assign (offset (register 6) (machine-constant 2)) (constant #t))
(mov w (@ro b 6 8) (&u #x20000000))
;; (pop-return)
(return)
odd?-1:
;; (assign (register #x11) (offset (register 4) (machine-constant 0)))
(mov w (r 0) (@r 4))
;; (lap-opt fixnum-add-const-in-place)
;; (assign (register #x12) (object->fixnum (register #x11)))
;; (assign (register #x13) (fixnum-1-arg minus-one-plus-fixnum (register #x12) #f))
;; (assign (register #x10) (fixnum->object (register #x13)))
(add w (r 0) (& #x3ffffff))
(and w (r 0) (& #x6bffffff))
;; (assign (offset (register 4) (machine-constant 0)) (register #x10))
(mov w (@r 4) (r 0))
;; (invocation:uuo-link 2 #f even?)
(jmp even?-2)

And people have pointed out that this may be surprising and baffling
to someone who expects to see the history of the computation laid out
on the stack.Scheme programmers make a big deal out of this. They point out that with proper tail recursion, Scheme doesn't need looping constructs like

`while`

or `for`

or `do`

because
the user can write them himself. But Scheme hackers are a fringe
group of the archetypical fringe group: Lisp hackers. They enjoy
*removing*features from the language that they consider ‘unnecessary’. Then they write papers that somehow ‘prove’ that this is better. Of course these academic papers contain an awful lot of math and not very many actual programs.

In fact, Scheme programmers will try to sneak tail recusion into other programming languages. There's hardly a point to doing so because other languages

*have*loops already. But now you get programs written by wet-behind-the-ears neophytes that use tail recursion rather than something more obvious. Furthemore, you cannot debug this code because the tail recursion has erased the backtrace.

And what does this buy you? In theory, when you exit a procedure with a tail-recursive call, you can replace the

`call/return`

instruction pair with a `jump`

. In theory, that's not much
of an improvement. In practice, however, it isn't quite that simple.
The stack usually has to be adjusted one way or the other and this
completely wipes out any benefit to be seen from avoiding the
`push`

and `pop`

of the return address.
To summarize, this point of view about tail recursion is based on these ideas:

- Its purpose is to avoid writing looping constructs, which are somehow considered ‘bad’ by ‘purists’. These weirdos ‘think’ in loops, then transform the code to be ‘recursive with tail calls’ when they write it, and then expect the compiler transform it back. This is academic mental gymnastics with no practical purpose.
- People who like tail recursion are purposefully burdening their code in the name of ‘mathematical’ or ‘theoretical’ purity, but they expect the language implementor to bear the weight.
- Any code that depends on tail recursion can easily be rewritten to use a looping construct instead. Such a rewrite is a trivial local transformation.
- Inter-procedural tail recursion may have some performance benefit, but such benefit is very small at best, and the tradeoff is the loss of valuable debugging information.

If these were in fact the key issues to be argued, then there would be little more to say beyond

*de gustibus non est disputandum*. But these arguments are a sideshow. The true argument about tail recursion is that it reduces the space complexity class of a program and allows a program access to a more powerful computation model. This argument is subtle.

Let me digress for a moment. Remember the Bubble Sort algorithm? If I were to write some code that used Bubble Sort to maintain the order of elements in a set, I would be lucky if I only got laughed at. The Hacker's Jargon file calls Bubble Sort `the generic

*bad*algorithm'.

What makes Bubble Sort so bad? It is extremely inefficient. On the average, Bubble Sort performs

`O(n^`^{2})

operations to sort a set of n
elements. The Big-O notation is the asymptotic time complexity of the
algorithm. `O(n^`^{2})

means that when the value of `n`

is large enough, the
number of operations grows proportional to the square of the number of
elements. So if we were to Bubble Sort a set of one thousand objects,
we might expect the number of operations to be in the millions. If
our set grew to ten thousand objects, the amount of time to sort it
would grow by a hundredfold.Let's compare this to mergesort. Mergesort has an asymptotic time complexity of

`O(n log n)`

. A mergesort of one thousand objects would
take something like `1000 * 4`

operations. This is much less than the
millions required for a Bubble Sort of the same set. If we grew the
set to ten thousand items, the amount of time to sort it would only
grow by a factor of a little more than ten. Certainly far less than
the hundredfold growth we see in Bubble Sort.Bubble Sort is bad because it wastes a valuable resource: time. And when you let the problem grow larger, it wastes a disproportionately larger amount of time compared to other, more efficient algorithms.

When you determine a strategy to solve a problem, you take a close look at the complexity classes of your algorithms. You'd like a solution within a reasonable amount of time, and you'd like to avoid using an unreasonable amount of space (whether memory or disk). Picking an inefficient algorithm for some minor part of your program can have a major impact on the resource usage of your program. Even the small change from

`O(n^`^{2})

to `O(n log n)`

makes a huge
difference.Asymptotic time complexity is not the only measure of an algorithms efficiency. Space (memory usage) can be as much of a limiting factor as time, so asymptotic space complexity should be considered. Consider the

`diff`

program that compares two files. The naive implementation of `diff`

using dynamic programming requires a rectangular tableau with a size
that is the product of the length of the two files being compared. `Diff`

ing a pair of fairly
large files in this way will exhaust memory. A tradeoff can be made
where multiple passes are needed to complete the `diff`

, but the space
complexity, rather than being `O(n*m)`

is `O(n + m)`

. It takes longer,
but now truly enormous files can be diffed without running out of memory.So what does this have to do with tail recursion?

A program describes a process. An implementation takes a program and performs the process the program describes. The process that emerges has a space and time complexity that is characteristic of

*both*the implementation and the program.

Programmers usually expect to be able to ignore the implementation's contribution to the time and space complexity of a problem. A programmer would be surprised if he had coded up a mergesort with the expected

`O(n log n)`

asymptotic time complexity but found that the
actual asymptotic time complexity on his computer was `O(n^`^{2})

. He
would be understandably upset if an obviously linear text search
required `O(n^`^{2})

asymptotic space complexity in the size of the
corpus.To quote Clinger:

Proper tail recursion yields a well-defined asymptotic space complexity that can be relied upon by portable programs.and

Among these [space consumption] properties of an implementation, proper tail recursion is especially important because it is easy to construct programs that should run in small constant space, but are likely to require at least linear space in implementations that are not properly tail recursive.

That is to say, if your implementation is not properly tail recursive, you will often find that a program that ought to run in

`O(1)`

space is
actually taking `O(n)`

space *or even worse.*

How could this be?

Consider this very simple program:

(define (foo x) (baz (bar x)))In a non tail recursive implementation, the call to

`baz`

would return
back to `foo`

, which would then return to `foo`

's caller. In a tail
recursive implementation, `foo`

would directly jump to `baz`

and `baz`

would
return directly to `foo`

's caller. The difference is that `foo`

releases
its resources *before*transferring to

`baz`

in the tail recursive case,
and *after*

`baz`

's computation in the non tail recursive case. There
are two resources of interest. The obvious one is the stack frame
itself. The less obvious one is the reference to the value of `X`

.
This is a very important. Code like the above is typical where one is performing a computation that involves part of a data structure. The call to

`bar`

traverses `X`

to find the relevant sub-structure to operate
upon. Once this substructure is located, it is no longer necessary to
retain the enclosing structure `X`

. The tail recursive implementation
immediately discards the reference to `X`

and allows the garbage
collector to reclaim the resources. The non tail recursive
implementation unnecessarily retains `X`

during the computation of `baz`

.
This is not simply an inconvenience. If `X`

holds substantial
resources, we may run out during `baz`

. If `baz`

somehow depends on a
recursive call to `foo`

, we'll find our space complexity is not a
function of the relevant information sitting in the `bar`

portion of `X`

,
but a function of `X`

itself, the *container*of our information. If we write a program that processes text sentence by sentence, we'd be surprised to find that the implementation uses space proportional to the entire document!

If one objects to Bubble Sort on the grounds of its poor asymptotic time complexity compared to better algorithms, then one clearly should object to a lack of tail recursion. It has even worse asymptotic space complexity, and it acts upon the entire program. (At least Bubble Sort doesn't slow down your program if you don't sort anything.)

Part II on its way....

(By the way, if you have questions or need some clarification on what I've said so far, please comment or email me.)