Friday, February 29, 2008

More monads

As Pascal Costanza pointed out in a comment from yesterday, monadic style isn't just used for sequencing and stringing together I/O. The key notion is that if your language can't do X directly (for whatever X is), you can do it indirectly by piecing together little program fragments into a program that does X and then running that. Monadic style has the added advantage that you can separate the program fragments from the `plumbing' to keep the abstraction nice and clean.

So what's this `unit' and `bind' stuff that monad tutorials go on about? These are the two primitive elements that you need to bootstrap yourself into monad land. (Actually, monad tutorials tend to skip over the fact that you'll probably want to extract the data at the other end. `Unit' and `bind' will get you into monad land, but you're on your own if you want to get back.) `Unit' is the function that stuffs something into the chained return value that passes through the plumbing. It's easy to understand `unit' — that chained return value has to carry your data around and someone has to be able to construct them. `Bind' is the function you write that adds a new program fragment to the `plumbing'. `Bind' is nasty beast, though.

Bind is hard to understand because it has a strange type signature. If you're new to monads, you'll want to take a close look at this. Bind extends the plumbing, so the first argument to bind is the output of a piece of plumbing, that is, one of our chained return values. The second argument to bind is one of our program fragments. Now this is weird: we'll let our program fragment take whatever sort of arguments it wants, but we'll require it to return a chained return value. That's right — bind is only going to handle the input, but it is up to the program fragment to construct a chained return value on the output.

Let's take a look at a bind programs. Remember that the magic behavior of the monadic style is due to how the bind function is coded, so different monads will have different bind functions.

The `state' monad allows you to pretend that there is a location that holds some mutable state. The bind program looks like this:
(define (bind plumbing fragment)
  (lambda (chained-return-value)
    (let* ((intermediate-chained-return-value (plumbing chained-return-value))
           (intermediate-state (get-state intermediate-chained-return-value))
           (prior-fragment-return-value (get-return-value intermediate-chained-return-value))
           (next-computation (fragment prior-fragment-return-value)))
        (next-computation intermediate-state))))
Recall that the plumbing is going to be a program that consumes an initial state value and produces a final chained return value. Bind has to return a new piece of plumbing, so we return a lambda expression that takes the initial chained return value. When we get passed that value, the first thing we do is send it through the existing plumbing. But we grab the resulting output. The chained return value has to contain two things: the return value of the last fragment in the plumbing pipeline and the value of the intermediate `state'. We separate these components. We pass the return value of the previous fragment in to the fragment we are binding. It will return the final piece of plumbing to which we can pass the intermediate state.

This is a bit messy and confusing, so I'll try to restate it again in a simpler manner. To extend the plumbing with a fragment, we return a new piece of plumbing that works by first dropping the chained value through the old plumbing then capturing the output and sending it to the fragment (in two separate phases).

There's something about this that bothers me, but I'll save that for tomorrow.

Thursday, February 28, 2008

How about them Monads?

I'm taking a short digression from the theme I was developing in the last few posts. This is related, but it probably isn't obvious how it's related.
Monads are the darling of the Haskell community. They enable Haskell programmers to perform extraordinary tasks like sequencing and IO. They've been given proper names like `Maybe' and `List'. They embue programs with special powers!

But what are they?

Monads are a fancy name for a simple solution to a simple problem. The problem is this: Haskell is a lazy language — nothing is evaluated unless it is required. In a lazy language you can't depend on `begin' or `progn' to force your code to run in a particular order. This is particularly a problem when you want to print something. Things will get printed as it is necessary to evaluate them, not in the order the print statements appear. In fact, since the return value of `print' is rarely used, the compiler can simply discard those statements altogether.

So let's suppose we have some F, G, and H, and we want to run them in that exact order. But our language is lazy and won't call anything unless absolutely necessary. There's only one way to force an ordering among functions: we make the value of one depend on the value of the other. We chain them like this: (h (g (f))). Now H cannot run until G produces an output, and G cannot run until F does, so they have to run in the order we want. (We're assuming that G and H actually compute something with their argument so that the function that produces the value has to be run.)

This is a nifty trick except for a few small problems. First, it looks kind of odd to mention H first and F last when F runs first and H runs last. Besides, the nesting will get a bit out of control if you have several of these things. What we really want is the higher-order compose function: (compose f g h) <=> (h (g (f))) Second, we have to rewrite F, G, and H so that they can chain together nicely. This might be a bit of a problem if G and H take more arguments. And we're using the output of F and G to force the sequencing. This will be a problem if F and G already have some useful output (no multiple values!).

The trick to overcoming these problems is easy. We wrap F, G, and H in a procedure object that manages the passing the chained return value around. We arrange for the wrapper procedure object to take a single argument (the return value from further down the chain) and to return a single argument (the return value for further up the chain). Now we can simply string the wrapped procedures together in any way we want and the dependency chain we construct will force the evaluation to happen in the exact order we want.

Oh yeah, this is a monad.

A monad is simply a collection of wrapped procedures that can be plugged into each other because each takes an argument and produces a value of the same type.

So where's the magic? The magic comes from two places: First of all, the compose function that plugs the wrappers together doesn't have to simply pass the chained return value through. It can do things like inspect the chained return value and bypass the rest of the functions if it wants. Second, the wrapper functions don't have to just invoke the function they wrap, they can, for example, keep a list that accumulates the return values from each function as it is called and stuff that in the chained return value. Basically, the wrappers can contain some fairly complex `plumbing' that the base functions F, G, and H don't have to know about.

There's one more tiny bit of magic. In Haskell, if you don't pass enough arguments to a function, the function will be automagically curried. This is a little bizarre. Imagine this in Common Lisp:
(defun foo (a b c) (+ a (* b c)))

(foo 2) => #<closure >

(funcall (foo 2) 3 4) => 14

(funcall (foo 2) 3) => #<closure >
The magic comes when we arrange for the chained return value to always be the rightmost value. We can then just drop that argument and forget that there is a chained return value and let the system do all the dirty work of automatically currying our functions. The final result is that the chained return value disappears from the code that was just using it to force sequencing. We've factored the program into two parts: the chunks that are blissfully unaware that they are going to be chained together in funny ways and the plumbing that really doesn't need to know what sort stuff is flowing through the pipes.

That, for the most part, is all there is to monads. If you like math, you can use category theory to prove that this works. Obviously there are constraints on what the wrapper functions can do if they need to be plugged together, and if you are truly anal, you can once again invoke category theory to prove that your personal set of wrapper functions satisfy the requirements.

I have to admit that I am a bit underwhelmed about this. Sure it's a cute trick, but it isn't all that amazing. What's all the hubbub about?

Tuesday, February 26, 2008

Three things are needed to support continuation-passing-style programming:
  1. First-class procedure objects or some equivalent.
  2. General tail recursion.
  3. A type system that won't have conniptions if you try to program in CPS.

This last one can be a serious problem. Let me demonstrate with a simple example. Suppose I have a table that can contain aliases for keys. An entry in a table could map a key to a value, or it could map a key to another key. The basic way to probe the table would be with this CPS function:
(define (lookup key table if-alias if-key if-not-found)

(lookup 'my-key *the-table*
    (lambda (another-key) ....)
    (lambda (value) ...)
    (lambda () ...))
Of course the point of the aliases is to have multiple names sharing the same value, so the typical use would be to perform a chained lookup if you find an alias. You'd have a procedure like this:
(define (chained-lookup key table if-found if-not-found)
  (lookup key table
     (lambda (another-key)
       (chained-lookup another-key table if-found if-not-found))
Scheme is a unityped language. (Bear with me here. I'm going to use the terminology of the static type community.) Both `if-found' and `if-not-found' will produce the same type of object: a `Scheme object'. And whoever consumes the result will expect nothing more nor less. But what if our language has static types?
Chained-lookup clearly produces an object of the same type as lookup, so we'll start there. Lookup consumes a key, a table that maps keys to values or keys, and three procedures that are constrained by the logic of lookup.
lookup: K x Table x (K -> R1) x (V -> R2) x ( - -> R3) -> R
This is a bit nasty. There is a key and value type, K and V respectively, but what are these R1, R2, and R3? The continuations can pretty much compute what they want, so R1, R2, and R3 are the return value types. We know that the lookup will return one of these, so R is the most narrow supertype of R1 R2 and R3. I'm cheating a little bit here. The argument of the continuation K->R1 can accept objects of a narrower subtype than K. So in the type description above, the different types actually express upper and lower limits on the types of the arguments. (The upper limits are co-variant and the lower limits are contra-variant.)
So what about chained lookup? Intuitively it ought to be this:
chained-lookup: K x Table x (V -> R2) x ( - -> R3) -> R
But is chained-lookup calling lookup correctly? What is that first continuation? Intuitively, we know it maps K -> R (where R is the most narrow supertype of R2 and R3), but is the compiler's type system able to deduce or accept this?
Well, that depends. Some languages have pretty complex type systems that can figure out what is going on. Others have ways to decorate your code with type variables (templates or generics). Others have ways to turn off the type checking (casting to void * and back). Or you may be screwed.
The point I'm trying to make here is that even if your language supports first-class procedures and tail recursion, it still might be impractical to write in continuation-passing-style because the type decorations become unmanageable.

Monday, February 25, 2008

Claim 2: General tail-recursion is an essential element of a programming language.

By `general' tail-recursion I mean that all calls that occur in `tail' position avoid allocating. Many languages or implementations support `self' tail-recursion in which tail-recursive calls to the same procedure are compiled or interpreted as loops, but fewer support the completely general case of arbitrary mutual tail-recursive procedures acting as a loop.

`Essential' is trickier to justify. There are many arguments for it and one really big argument against it. Might as well play devil's advocate first.

General tail recursion cannot be `essential' in a meaningful way because few programming languages support it and no one is complaining.

That is certainly a true statement, so let's compare a hypothetical language that has two variants, one has no tail recursion, the other supports general tail recursion. The first observation we can make is this: Any program that runs in a language without tail recursion will also run in a language with tail recursion. (That is, if it completes with an answer, both versions would produce the same answer.) The second observation is this: There are programs that run in a tail recursive language that will fail to run in a language without tail recursion. These would obviously be the programs that run out of stack or memory. So the claim that general tail recursion is essential is equivalent to the claim that there exist interesting programs that run in a tail recursive language and fail to run in a language without tail recursion. The devil's advocate position would deny this:

General tail recursion cannot be `essential' because there are no interesting programs that require a general tail-recursive implementation.

It seems unlikely that this is an a priori truth. Even the fact that there exists a class of programs that require general tail-recursion is interesting in itself. There is a slightly weaker argument that gets around this objection without much difficulty:

There are no interesting programs that require a general tail-recursive implementation because there are equivalent programs that do not require a general tail-recursive implementation.

This is also true, but the argument has been critically weakened. The equivalent programs are not, in general, `macro expressible' in the original language. You can turn the self tail-recursion into a loop fairly easily but you cannot make a loop out of mutually tail-recursive procedures without whole-program analysis (that is, an analysis at the level above the scoping of the mutually recursive procedures).

So why doesn't anyone complain about this? Most programming languages require you to write loops using special primitive looping constructs. These languages encourage the programmer to indentify and separate out the iterative and recursive forms. The programmer is trained from very early on to perform the larger analysis needed to identify iterative control flow and rewrite it as a looping construct. It isn't all that difficult, you have to do it anyway, so what's there to complain about?

Continuation passing style.

Continuation passing style is an example of a programming paradigm that can be very difficult to convert into a looping construct. Every CPS function performs a tail-recursive call to another CPS function, and rather than keep nested state on the stack, the state is kept in nested closures. If every function takes exactly one continuation argument, then it isn't too hard to `unravel' the CPS code into an equivalent `direct style' program, but the more complicated CPS code that takes more than one continuation cannot be easily converted to direct style.

There are few loops in continuation passing style. Control is transferred from function to function in a uniform manner and the difference between `iteration' and `recursion' is seen only as a difference in the space complexity of the continuation arguments. The interpreter or compiler must be sure to avoid inessential allocation when transferring control or an otherwise iterative construct would consume an arbitrary amount of memory (or stack).

If you agree with my claim that languages that support continuation-passing-style are more expressive than ones that do not, and that general tail recursion is a necessary prerequisite for continuation-passing-style, then you ought to agree that languages that have general tail recursion are more expressive than ones that do not.

Some more about general tail recursion later.

Friday, February 22, 2008

Claim 1: A language that supports the ability to program in continuation-passing-style is significantly more expressive than one that does not.

By `expressive' I mean something like Matthias Felleisen's `macro expressibility' — rewriting the entire program doesn't count.

This claim is pretty self-evident except for the word `significantly'. You might argue that continuation-passing-style is an unimportant triviality.

I'll argue that Steele's Lambda papers clearly show that continuation-passing-style can be used to express arbitrary control structure within the language. You don't need to modify the interpreter or compiler.

I need to define what I mean by `supports the ability'. The first thing you need is the ability to `pass a continuation' to a program and possibly invoke it. To do this, you need some mechanism for binding code to a context and using the result as an argument. Lisp does this trivially with anonymous lambda expressions. C# can use anonymous delegates. The anonymity saves you from having to think up a name that won't ever be used, and the lexical nesting of these constructs saves you from having to figure out the necessary context, but you can achieve the same effect (albeit with some extra work) with the popular class/interface constructs you might find in Java.

There are some other things you need to truly `support the ability', but I'll save them for another post.