Saturday, April 10, 2021

Can continuation passing style code perform well?

Continuation passing style is a powerful technique that allows you to abstract over control flow in your program. Here is a simple example: We want to look things up in a table, but sometimes the key we use is not associated with any value. In that case, we have to do something different, but the lookup code doesn't know what the caller wants to do, and the caller doesn't know how the lookup code works. Typically, we would arrange for the lookup code to return a special “key not found” value:

(let ((answer (lookup key table)))
   (if (eq answer 'key-not-found)
       ... handle missing key ...    
       ... compute something with answer...)

There are two minor problems with this approach. First, the “key not found” value has to be within the type returned by lookup. Consider a table that can only contain integers. Unfortunately, we cannot declare answer to be an integer because it might be the “key not found” value. Alternatively, we might decide to reserve a special integer to indicate “key not found”. The answer can then be declared an integer, but there is now a magic integer that cannot be stored in the table. Either way, answer is a supertype of what can be stored in the table, and we have to project it back down by testing it against “key not found”.

The second problem is one of redundancy. Presumably, somewhere in the code for lookup there is a conditional for the case that the key hasn't been found. We take a branch and return the “key not found” value. But now the caller tests the return value against “key not found” and it, too, takes a branch. We only take the true branch in the caller if the true branch was taken in the callee and we only take the false branch in the caller if the false branch was taken in the callee. In essence, we are branching on the exact same condition twice. We've reified the control flow, injected the reified value into the space of possible return values, passed it through the function call boundary, then projected and reflected the value back into control flow at the call site.

If we write this in continuation passing style, the call looks like this

(lookup key table
   (lambda (answer)
     …compute something with answer)
   (lambda ()
     …handle missing key…))
lookup will invoke the first lambda expression on the answer if it is found, but it will invoke the second lambda expression if the answer is not found. We no longer have a special “key not found” value, so answer can be exactly the type of what is stored in the table and we don't have to reserve a magic value. There is also no redundant conditional test in the caller.

This is pretty cool, but there is a cost. The first is that it takes practice to read continuation passing style code. I suppose it takes practice to read any code, but some languages make it extra cumbersome to pass around the lambda expressions. (Some seem actively hostile to the idea.) It's just more obscure to be passing around continuations when direct style will do.

The second cost is one of performance and efficiency. The lambda expressions that you pass in to a continuation passing style program will have to be closed in the caller's environment, and this likely means storage allocation. When the callee invokes one of the continuations, it has to perform a function call. Finally, the lexically scoped variables in the continuation will have to be fetched from the closure's environment. Direct style performs better because it avoids all the lexical closure machinery and can keep variables in the local stack frame. For these reasons, you might have reservations about writing code in continuation passing style if it needs to perform.

Continuation passing style looks complicated, but you don't need a Sufficiently Smart compiler to generate efficient code from it. Here is lookup coded up to illustrate:

(defun lookup (key table if-found if-not-found)
   (labels ((scan-entries (entries)
              (cond ((null entries) (funcall if-not-found))
                    ((eq (caar entries) key) (funcall if-found (cdar entries)))
                    (t (scan-entries (cdr entries))))))
     (scan-entries table)))
and a sample use might be
(defun probe (thing)
  (lookup thing *special-table*
    (lambda (value) (format t "~s maps to ~s." thing value))
    (lambda () (format t "~s is not special." thing))))

Normally, probe would have to allocate two closures to pass in to lookup, and the code in each closure would have to fetch the lexical value of key from the closure. But without changing either lookup or probe we can (declaim (inline lookup)). Obviously, inlining the call will eliminate the overhead of a function call, but watch what happens to the closures:

(defun probe (thing)
  ((lambda (key table if-found if-not-found)
     (labels ((scan-entries (table)
                (cond ((null entries) (funcall if-not-found))
                      ((eq (caar entries) key) (funcall if-found (cdar entries)))
                      (t (scan-entries (cdr entries))))))
        (scan-entries table)))
    thing *special-table*
    (lambda (value) (format t "~s maps to ~s." thing value))
    (lambda () (format t "~s has no mapping." thing))))
A Decent Compiler will easily notice that key is just an alias for thing and that table is just an alias for *special-table*, so we get:
(defun probe (thing)
  ((lambda (if-found if-not-found)
     (labels ((scan-entries (entries)
                (cond ((null entries) (funcall if-not-found))
                      ((eq (caar entries) thing) (funcall if-found (cdar entries)))
                      (t (scan-entries (cdr entries))))))
        (scan-entries *special-table*)))
    (lambda (value) (format t "~s maps to ~s." thing value))
    (lambda () (format t "~s has no mapping." thing))))
and the expressions for if-found and if-not-found are side-effect free, so they can be inlined (and we expect the compiler to correctly avoid unexpected variable capture):
(defun probe (thing)
  ((lambda ()
     (labels ((scan-entries (entries)
                (cond ((null entries)
                       (funcall (lambda () (format t "~s has no mapping." thing))))
                      ((eq (caar entries) thing)
                       (funcall (lambda (value) (format t "~s maps to ~s." thing value))
                                (cdar entries)))
                      (t (scan-entries (cdr entries))))))
        (scan-entries *special-table*)))))
and the immediate calls to literal lambdas can be removed:
(defun probe (thing)
   (labels ((scan-entries (entries)
              (cond ((null entries) (format t "~s has no mapping." thing))
                    ((eq (caar entries) thing)
                     (format t "~s maps to ~s." thing (cdar value))))
                    (t (scan-entries (cdr entries))))))
      (scan-entries *special-table*)))

Our Decent Compiler has removed all the lexical closure machinery and turned the calls to the continuations into direct code. This code has all the features we desire: there is no special “key not found” value to screw up our types, there is no redundant branch: the (null entries) test directly branches into the appropriate handling code, we do not allocate closures, and the variables that would have been closed over are now directly apparent in the frame.

It's a bit vacuous to observe that an inlined function performs better. Of course it does. At the very least you avoid a procedure call. But if you inline a continuation passing style function, any Decent Compiler will go to town and optimize away the continuation overhead. It's an unexpected bonus.

On occasion I find that continuation passing style is just the abstraction for certain code that is also performance critical. I don't give it a second thought. Continuation passing style can result in high-performance code if you simply inline the critical calls.

3 comments:

Anonymous said...

Thanks, interesting insight. Nitpick: missing "funcall"s when calling if-found and if-not-found.

Joe Marshall said...

Fixed. Thanks!

John Cowan said...

R7RS-large now provides this style for hash tables, but with two differences that I think are improvements: both are optional, and failure precedes success, because often there's nothing special you need to do on success; you just want the value. So the defaults are "error" for failure and "identity" for success. There is also a trivial variant with a required default value, equivalent to a failure thunk that returns that value.