Tuesday, April 28, 2009

You knew I'd say something (Part II)

Part II
van Rossum writes:
Is it really such a big deal to rewrite your function to use a loop? (After all TRE only addresses recursion that can easily be replaced by a loop. :-)
If this were true, I'd be far, far less concerned about the issue. Sure, a lot of tail recursion really is just looping, and it isn't such a big deal to use a loop. In fact, a loop may be a better construct in many situations. This particular use of tail recursion is not the interesting case.

It is difficult to find a compelling example of code that demonstrates why interprocedural tail recursion is useful. You won't find many examples taken from languages that don't support tail recursion; they are either too small to cause a problem, or the author used some sort of workaround to avoid the space issues. Examples taken from languages that do support tail recursion will inevitably be dismissed as unrealistic and artificial, or particular to Scheme, or simply irrational (‘No real program would ever be written that way. Maybe a Scheme programmer would think in such a way, but...’)

The difficulty is in finding a program that is big enough to illustrate the problem, small enough to post in a blog, easy enough to understand, but complex enough to suggest what problems would occur in the real world.

Continuation passing style needs proper tail recursion to be useful on anything but small problems, but it is a very complex technique which makes it hard to illustrate succinctly. Several people have suggested implementing a state machine through tail calls. I like this technique, but there are so many ways of implementing state machines that someone is bound to claim that the problem is artificial.

Here's an example that I think will be persuasive.

Consider the Visitor Pattern. This is a technique that allows you to abstract away the details of a data structure and the method of traversing it from the algorithm that operates on the data. It is commonly found in object-oriented languages like Java.

In this Python code we implement a binary tree and we provide a means to traverse it with the Visitor Pattern. To make things a bit more interesting, we also implement a DelegateNode class that acts like a Node by delegating the method calls. In this example we pass an accumulator variable with the visitor. We can use the accumulator to collect information about the tree.
import math, random, sys, traceback

class Node:
  def __init__(self, left, right):
      self.left = left
      self.right = right

  def accept(self, visitor, accum):
    return visitor.visitNode (self.left, self.right, accum)

class Leaf:
  def __init__(self, value):
    self.value = value

  def accept(self, visitor, accum):
    return visitor.visitLeaf(self.value, accum)

class DelegateNode(Node):
  def __init__(self, originalNode):
    self.originalNode = originalNode

  def accept(self, visitor, accum):
    return self.originalNode.accept(visitor, accum)

# Generate a tree for us to visit.
def GenerateTree(n):
  if n <= 0:
    return Leaf (1)
    return DelegateNode (Node(GenerateTree(int(math.sqrt(float(n))) - 1),
                              GenerateTree(n - 1)))
Now remember that this is a toy example. A real example would have several kinds of nodes and the nodes would have some interesting functionality and we'd do something more than just traverse them. The important point is this is a simplified version of some code we might actually encounter in a real-world problem.

So let's traverse the tree and count the different kinds of nodes. First we need a Visitor that can do that:
class TreeCountingVisitor:
  def visitNode (self, left, right, accum):
    return right.accept (self, left.accept (self, [accum[0] + 1, accum[1]]))

  def visitLeaf (self, value, accum):
    return [accum[0], accum[1] + 1]
The accumulator is a two-tuple. The first element is the number of nodes, the second element is the number of leaves. When we visit a leaf, we return a new tuple with the leaf count bumped by 1. When we visit a node, we bump the node count by 1 before traversing the left branch. Since the visitor returns an updated accumulator, we pass that in when we traverse the right branch.

And our main function to try this out. We'll start with a small tree and see how it scales.
if __name__=="__main__":

  for i in range(25,1000,25):
    print "Generating tree " + str(i)
    testTree = GenerateTree(i)
    print "Testing"
    total = testTree.accept(TreeCountingVisitor(), [0,0])
    print "Total is " + str(total)
On my machine, this runs out of stack at about 350.

Can the tail recursion in this be easily replaced by a loop? It depends on what you mean by easy. Certainly we could write a looping program that traverses the tree without using the Visitor pattern, and it wouldn't take long at all. But in a real-world situation, this may not be so easy. If the Visitor pattern is the ‘advertised’ API, we'd be deliberately bypassing an important abstraction. Another developer could change a node or add a new node type and provide the appropriate Visitor API, but if we weren't using the Visitor pattern, we'd be out of luck.

Assuming we want to organize our code around the Visitor pattern, it becomes much harder to replace the recursion with a loop.

Part III in a day or two...


Duncan said...

I decided to port your little Python program to Scheme to show people that it is really not that much harder.

You can find the code here:


Duncan said...

I tried to be clever and removed the type tags in my 'objects' before publishing my code. It turns out that that change introduced some bugs, so I reverted it:


Alexey B. said...

Hm... Is proper tail recursion so essential in this example?

In this line:

return right.accept (self, left.accept (self, [accum[0] + 1, accum[1]]))

The inner `accept' is not in tail position, so the tree won't be traversed in constant space. Like scheme, python (still not having proper tr) could also traverse large trees if its call stack wasn't so small (i.e. if call stack was not separated from the data heap).

Am I wrong?

Unknown said...

Your example is contrived not because it is simple, but because the visitor pattern is entirely useless in a "duck typed" language like Python. Visitor is only needed in statically typed languages like Java as a workaround.

In Python, you would simple traverse the tree and call accept on each node. In fact, if you had to do this operation frequently, you'd just keep all the nodes in a list and do them all with a single loop.

So far, you claim that there's a use for this style of tail recursion, but you can't present one that isn't contrived or easily replaced with a simpler concept.

I'd like to see your statemachine example actually.

grant rettke said...

Regarding state machines and "what tail recursion is worth", it is worth however you value what it provides. I don't understand the pushback against tail recursion, or the demand to prove how mighty it is (not to equate them: when is the last time someone argued for loops). It is what it is, and it is nice; it allows you to solve a problem another way, it gives you that freedom. Anyway I can't add much more than the papers I listed here:


LAMBDA: The Ultimate GOTO
LAMBDA: The Ultimate Imperative
Automata via Macros

Unknown said...

When you program Scheme in Python, you fail. When you program Python in Scheme, you fail.

This is grotesquely unPythonic. In Python, you should write a generator that traverses the tree, which can be written to not use tons of stack very easily, and then you'd write something for "for node in tree.depthFirst(): visit(node)"

Notice, by the way, that if you work this out entirely, that it takes about a fifth to a tenth of the code, doesn't involve strewing accumulators everywhere, and does everything this example does. These are signs you are doing it wrong.

This example is still terrible. Python has a lot of good tools that few other languages have (like generators), and monomaniacally ignoring them because all you can see is your pet optimization is not a good argument.

You need to show an example where the solution would be more elegant than the canonical Python solution if and only if it could use tail recursion, not a grotesquely less elegant solution than is literally constructed for the purpose of blowing the stack.

Benjamin Horstman said...

To really prove what you're after, you need to do three things:

1) Find some abstraction which cannot be elegantly programmed in Python
2) Provide a tail call version of it that is elegant
3) Convince the Python community of this (what is elegant to you may not be elegant to them)

Here, you've failed at 1,2, and 3.

vk said...

I don't get what is the problem with the tail recursion to warrant such a hostility. anyway... the fact that you can implement any loop with a local variable, 'if' and a 'goto' doesn't mean we should stop using loops. everything can be implemented w/o tail recursion, just like it can be done w/o use of functions and loops. 'if' and goto will provide you all you need :).
In fact from what I remember programming basic when I was a kid I didn't really miss any of the advanced features that I didn't yet know about. this is the blub language paradox in its best. to really understand if tail recursion needed or now you *must* use it for some time, then go back to a language that doesn't have it :) more about blub paradox here: http://www.paulgraham.com/avg.html (midway through the article)

Joe Marshall said...

Duncan: How many nodes can you visit before you run out of space?

Joe Marshall said...

Mr. Cat: Proper tail recursion gives you better space utilization. You could also simply use more space (make the call stack bigger, swap out to heap, etc.). But for any program, a properly tail recursive implementation will use no more space (and likely use quite a bit less space) than a non tail recursive one.

Joe Marshall said...

zedshaw: The visitor pattern is common in Python. See Google code search.

Joe Marshall said...

jerf: What I claim is that this code cannot easily be rewritten to turn the tail recursion into a loop. The fact that you can do a global rewrite to use iterators doesn't change that claim. My claim is that you cannot do a local rewrite that turns the tail recursion into a loop.

Joe Marshall said...

Mr. Horstman: I have no illusion that I can persuade the Python community of anything.

Benjamin Horstman said...

The fact that it would require a global rewrite to use iterators is only interesting if you could show that it is likely that someone would program it the recursive way first and not notice the limitations.

Ramzi said...

Tail recursion would be necessary if you want to implement an actor library (like in scala for example), even though python doesn't have code blocks.

Sandro Magi said...

Pervasive immutability results in less buggy programs, on average.

Small functions are similarly easier to debug, on average.

Small functions tend to use arguments immutably, and languages which enforce immutable bindings are even better (Python does not enforce the latter).

The lack of tail recursion optimization inhibits the composition of small functions with immutable bindings, and subsequently makes it harder to write safer programs by exploiting immutability. It's possible, just harder than it could be.

The suggestion to use generators is the only realistic possibility. Traversing multiple trees simultaneously is also possible by zipping the two generator streams and processing them as a stream of pairs. This sort of processing becomes progressively more complicated with loops, so functions and generators are the only serious contenders here.

Given Python does not support tail recursion, generators are the only realistic possibility. Of course, generators are simply an inverted tail recursive fold, so that should come as no surprise.

Benjamin Horstman said...

@Sandro: Python isn't Haskell either. While I agree with what you've said, that is not the point of Python.

What you're making is not an argument for TCO in Python, but for functional languages instead of imperative ones (which is a perfectly reasonable argument to make).

Sandro Magi said...

Actually, I was making an argument for the inclusion of TCO in any language with functions to encourage function composition, while pointing out that it's not strictly necessary in Python since generators can supplant most such uses.

Brendan MacDonell said...

@jrm: (RE: Mr. Cat)
He doesn't have any issue with TR itself, but rather the fact that your example isn't technically relevant to tail recursion. His point was that even if python was properly tail-recursive, the provided code couldn't execute in a constant space because the recursive code must traverse the entire left branch before reaching the right one. Yes, the right child of each node with occupy no more stack space thanks to TR, but in a balanced tree the maximum stack size of the program will still be met in order to reach the leftmost node (which, as in a non-tail-recursive language implementation, will be O(log2(n))).

Joe Marshall said...

Mr. McDonell: In the tail recursive version, the stack depth is proportional to the number of left-hand edges between the current node and the root. In a non-tail recursive version, the stack depth would be proportional to the total number of edges between the current node and the root, times two for the callbacks, plus the number of intervening delegate nodes.
If this were a very balanced tree, we'd be able to go deeper, but only by a constant factor. On the other hand, the average space usage is much smaller. (Tail recursion doesn't guarantee constant space, but it can often remove a factor of `N')
But I deliberately designed the creation function to make trees that favor one side. The left hand sides are smaller than the right, so there is an advantage to iterating on the right side, but recurring on the left. (I got this idea from a compiler book. One particular flow algorithm produced these trees that were largely, but not completely linear. The code for the algorithm constructed its own stack for traversal so it could iterate down the main path, but recur down the side paths. But you get this for free with proper tail recursion.)

Alexey B. said...

Brendan MacDonell: Thanks for making my point more clear.

Proper tail recursion is what makes recursive traversal of data structures space-efficient. But it is limited call stack what makes it unreliable. You cannot rely on such an algorithm if you are not sure, that you won't hit a long tree branch or graph path and overflow the call stack.

So, my opinion is that it is not worth adding a single functional (functional - I mean, borrowed from functional languages) feature to python. But it could be worth to introduce a consistent set of functional features and see what will this result in.

Unknown said...

The state machine example is actually a very good one, and not at all artificial. Take Shriram's Swine before Perl talk; I first watched it about 7 years ago, and my initial reaction was negative.

But the talk is actually very interesting, and the example is compelling; I didn't understand why it was interesting until I started to understand how a good Scheme compiler actually works, where the state gets encoded by the Program Counter. The end result in machine code can be as close to what you could do with hand-written assembly.

However, this argument might not apply to CPython, however. I'd have to think about that one.

Onnie Lynn Winebarger said...


One of the benefits of tail recursion that's difficult to communicate here is this: core Scheme an efficient target language for macros implementing domain specific languages - automata and logic languages both come to mind. Scheme is really a higher order assembly language as far as control flow goes.

Also, the real equation is "looping + memory allocation = proper tail recursion + higher order functions." While this equivalence is real, the former is just more difficult to work with. Higher order functions are just easier to read merely by virtue of being localized, as opposed to data structures + separate functions which represent a higher order function after lambda lifting and closure conversion.

The other part of Guido's first post that bothered me was the indication that there was something mysterious or difficult about adding proper tail recursion to a language. It only requires inventing the appropriate CPS transformation and performing the static reduction of administrative redexes, then designing the desired closure conversion. It actually sounds like Guido is bragging that Python does not actually allow any expressions to occur in tail position, and so there's nothing to optimize.

Looking forward to part 3.

Artyom Shalkhakov said...

While I fully agree with you w.r.t tail recursion and TCO, I think that you can't persuade Pythoners to do things otherwise than the do now.

It looks like most programmers do not need a freedom to choose what's more appropriate for the situation at hand, but rather a small set of "design patterns" to adhere to and a couple of "magic words" to treat as being the truest true things in the world.

I don't mean to insult anyone, please do not misinterpret me. I find it's just ridiculous that people are so negative about things that they don't understand or don't get an immediate profit of.

John Stracke said...

"It only requires inventing the appropriate CPS transformation"

Actually, you could probably recognize tail calls in the bytecode: if CALL_FUNCTION is followed immediately (~~) by RETURN_VALUE, then the two can be coalesced into a single TAIL_CALL_FUNCTION (if there were such an opcode, of course).