Tuesday, April 28, 2009

You knew I'd say something (Part II)

Part II
van Rossum writes:
Is it really such a big deal to rewrite your function to use a loop? (After all TRE only addresses recursion that can easily be replaced by a loop. :-)
If this were true, I'd be far, far less concerned about the issue. Sure, a lot of tail recursion really is just looping, and it isn't such a big deal to use a loop. In fact, a loop may be a better construct in many situations. This particular use of tail recursion is not the interesting case.

It is difficult to find a compelling example of code that demonstrates why interprocedural tail recursion is useful. You won't find many examples taken from languages that don't support tail recursion; they are either too small to cause a problem, or the author used some sort of workaround to avoid the space issues. Examples taken from languages that do support tail recursion will inevitably be dismissed as unrealistic and artificial, or particular to Scheme, or simply irrational (‘No real program would ever be written that way. Maybe a Scheme programmer would think in such a way, but...’)

The difficulty is in finding a program that is big enough to illustrate the problem, small enough to post in a blog, easy enough to understand, but complex enough to suggest what problems would occur in the real world.

Continuation passing style needs proper tail recursion to be useful on anything but small problems, but it is a very complex technique which makes it hard to illustrate succinctly. Several people have suggested implementing a state machine through tail calls. I like this technique, but there are so many ways of implementing state machines that someone is bound to claim that the problem is artificial.

Here's an example that I think will be persuasive.

Consider the Visitor Pattern. This is a technique that allows you to abstract away the details of a data structure and the method of traversing it from the algorithm that operates on the data. It is commonly found in object-oriented languages like Java.

In this Python code we implement a binary tree and we provide a means to traverse it with the Visitor Pattern. To make things a bit more interesting, we also implement a DelegateNode class that acts like a Node by delegating the method calls. In this example we pass an accumulator variable with the visitor. We can use the accumulator to collect information about the tree.
import math, random, sys, traceback

class Node:
  def __init__(self, left, right):
      self.left = left
      self.right = right

  def accept(self, visitor, accum):
    return visitor.visitNode (self.left, self.right, accum)

class Leaf:
  def __init__(self, value):
    self.value = value

  def accept(self, visitor, accum):
    return visitor.visitLeaf(self.value, accum)

class DelegateNode(Node):
  def __init__(self, originalNode):
    self.originalNode = originalNode

  def accept(self, visitor, accum):
    return self.originalNode.accept(visitor, accum)

# Generate a tree for us to visit.
def GenerateTree(n):
  if n <= 0:
    return Leaf (1)
    return DelegateNode (Node(GenerateTree(int(math.sqrt(float(n))) - 1),
                              GenerateTree(n - 1)))
Now remember that this is a toy example. A real example would have several kinds of nodes and the nodes would have some interesting functionality and we'd do something more than just traverse them. The important point is this is a simplified version of some code we might actually encounter in a real-world problem.

So let's traverse the tree and count the different kinds of nodes. First we need a Visitor that can do that:
class TreeCountingVisitor:
  def visitNode (self, left, right, accum):
    return right.accept (self, left.accept (self, [accum[0] + 1, accum[1]]))

  def visitLeaf (self, value, accum):
    return [accum[0], accum[1] + 1]
The accumulator is a two-tuple. The first element is the number of nodes, the second element is the number of leaves. When we visit a leaf, we return a new tuple with the leaf count bumped by 1. When we visit a node, we bump the node count by 1 before traversing the left branch. Since the visitor returns an updated accumulator, we pass that in when we traverse the right branch.

And our main function to try this out. We'll start with a small tree and see how it scales.
if __name__=="__main__":

  for i in range(25,1000,25):
    print "Generating tree " + str(i)
    testTree = GenerateTree(i)
    print "Testing"
    total = testTree.accept(TreeCountingVisitor(), [0,0])
    print "Total is " + str(total)
On my machine, this runs out of stack at about 350.

Can the tail recursion in this be easily replaced by a loop? It depends on what you mean by easy. Certainly we could write a looping program that traverses the tree without using the Visitor pattern, and it wouldn't take long at all. But in a real-world situation, this may not be so easy. If the Visitor pattern is the ‘advertised’ API, we'd be deliberately bypassing an important abstraction. Another developer could change a node or add a new node type and provide the appropriate Visitor API, but if we weren't using the Visitor pattern, we'd be out of luck.

Assuming we want to organize our code around the Visitor pattern, it becomes much harder to replace the recursion with a loop.

Part III in a day or two...