Thursday, March 5, 2009

Not Lisp again....

It was 1983 and I was taking my first real computer course at MIT. Hal Abelson and Bill Siebert were co-lecturers. Hal was delivering the bad news:
“If you already know how to program, you may be at a disadvantage because you will have to unlearn some bad habits.”
Wonderful. I knew how to program (after all, I knew Macro-11, Z80 assembly and BASIC), and I could guess what the bad habits were. Violating the un-cuteness principle was probably one of them. Nothing fun is going to be allowed. (I rarely take notes at lectures. I'm too busy keeping an internal running commentary.)
“In this course we will be using the programming language Lisp...”
Argh! Not that again! What is it with Lisp? Ok, maybe at Harvard they do that sort of thing, but this was MIT! Don't they hack computers here?

I was wondering what I had gotten myself in to. I held on to two small hopes. First, that the little amount of Lisp I had learned at Harvard would give me a head start over the other 250 students in the room. Second, that I only had to take a couple of computer classes to fulfill that part of the EECS requirement. I could stomach a bit of Lisp for a while, toe the company line, and be completely un-cute long enough to pass the course.

Hal put the archetypical Lisp program on the blackboard:
(define (fact x)
  (if (zero? x)
      1
      (* x (fact (- x 1)))))
He walked the class through the substitution model:
(fact 5)
(* 5 (fact 4))
(* 5 (* 4 (fact 3)))
(* 5 (* 4 (* 3 (fact 2))))
(* 5 (* 4 (* 3 (* 2 (fact 1)))))
(* 5 (* 4 (* 3 (* 2 (* 1 (fact 0))))))
(* 5 (* 4 (* 3 (* 2 (* 1 1)))))
(* 5 (* 4 (* 3 (* 2 1))))
(* 5 (* 4 (* 3 2)))
(* 5 (* 4 6))
(* 5 24)
120
“This is a recursive process. At each step in the recursion we break the problem into smaller parts, solve each part separately, then combine the answers. In this example, each recursive call to fact uses a smaller number. Eventually we get to fact 0 which we know is 1. At this point we can start the multiplication.”
And then he said “This isn't very efficient.”

I thought that was an odd thing to say about a Lisp program. At Harvard, the obvious inefficiency of Lisp was downplayed. Here the professor was pointing it out and was going to do something about it.
“Look how the size of the problem depends upon the value of X. When X is five, there are five pending multiplications that have to be kept track of. If X were ten, then there would be ten pending multiplications. The computer is going to have to use up resources to keep track of these.
“What we want to do is to keep track of the product of the numbers we've seen so far. Like this:
  (define (fact x)

    (define (iter n accum)
      (if (zero? n)
          accum
          (iter (- n 1) (* n accum))))

    (iter x 1))

  (fact 5)
  (iter 5 1)
  (iter 4 5)
  (iter 3 20)
  (iter 2 60)
  (iter 1 120)
  (iter 0 120)
  120
“This is an iterative process. Instead of breaking the problem into parts that we solve separately, we convert the problem into a simpler problem that has the same solution. The solution to (iter 5 1) is the same as the solution to (iter 4 5), which is the same as the solution to (iter 3 20) and so forth.
“Notice how in this case we aren't building a chain of pending multiplications. At each iteration, the value of n is being mutiplied by the accumulated product and this becomes the new accumulated product in the next iteration. When n reaches zero, we have in the accumulator the product of all numbers from n down to zero.
This was pretty obvious to me. However, I thought Hal was omitting an important point. Clearly each recursive call to iter had to be matched by a corresponding (implicit) return. Otherwise, how would the program know how to get back to the caller? Granted, there were no pending multiplications, but that doesn't mean we're not using up stack frames.

But then Hal said, “With an iterative process, the computer doesn't need to use up resources for each iteration. If you know a bit about programming, you might have expected that each iterative call, like each recursive call, would consume a small amount of memory and eventually run out. Our system is smart enough to notice the iteration and avoid doing that.”

Now that caught my attention. I was impressed first by the fact that whoever designed this particular Lisp system cared about efficiency. I was impressed second by the fact that he was able to program the computer to automatically figure it out. I was impressed third that however it was that it worked, it had to be really fast at it (or the cure would be worse than the problem).

Frankly, I had my doubts. I thought that maybe the computer was able to figure this out some of the time, or maybe it was able to substantially reduce, but not totally eliminate, stack allocation. Or maybe this was some sleight of hand: the resources are allocated somewhere else. I tried to figure out how I'd do this in assembly code, but I was baffled. I made a note to myself to check into this.

Hal went on to introduce first-class procedures. I didn't see the rationale for these. I mean, you can't call a function that hasn't been written, and if it has been written then why not just call it directly and avoid the run-around?

Hal said “You're all familiar with the derivative operator.” and he wrote it on the blackboard:
                               f (x + dx) - f (x)
       D f(x) = f'(x) = lim  --------------------
                                      dx
“We can program this as follows:
    (define dx .0001)

    (define (deriv f)
   
       (define (f-prime x)
         (/ (- (f (+ x dx)) (f x))
            dx))

       f-prime)
“Notice how our formula for the derivate translates directly into the lisp code.
“Let's take the derivative of the cube function:
    (define (cube x) (* x x x))

    (define cube-deriv (deriv cube))
“And this new function should act like the derivative of the cube function.”
He put up a slide on the overhead project that showed the above code and the output from the computer:
(cube-deriv 2)
;Value: 12.000600010022566

(cube-deriv 3)
;Value: 27.00090001006572

(cube-deriv 4)
;Value: 48.00120000993502
Sure enough, that's reasonably close to 3x^2.

I was floored. Here we take the simple, straightforward definition of the derivative of a function. Type it in with essentially no modification (we're a little cavalier about dropping the limit and just defining dx to be a ‘reasonably small number’) and suddenly the computer knew how to do calculus! This was serious magic. I had never seen anything like it before.

Hal went on to explain how the substitution model of evaluation worked in this example and I carefully watched. It seemed simple. It couldn't be that simple, though, could it? Something must be missing. I had to find out.

As soon as the lecture ended I ran to the Scheme lab. Hewlett Packard had donated several dozen HP 9836 Chipmunks to MIT. Each one was equiped with a blindingly fast 8-MHz 68000 processor. On top of each workstation sat a fairly sizable expansion box that held an additional 16 megabytes of RAM. A student had sole use of an entire machine while working on problem sets. No time-sharing! It was an incredibly lavish setup.

The lab was empty (go figure, everyone had just gotten out of the lecture). I carefully typed in the examples from the lecture and tried them out. I tried the recursive factorial and managed to blow out the stack with a fairly big number, so I knew there was indeed a stack limit and how the machine behaved when you hit it. I removed the multiply from the iterative factorial to avoid making huge numbers and gave it some ridiculously large argument. The machine churned and churned and eventually returned an answer. I gave it a bigger number. A lot more churning, but no sign of stack overflow. I guess that really worked.

Then I typed in the derivative example from the lecture. Sure enough it worked just like Hal said. I tried passing in a few other functions. They worked, too. I did the homework exercises for fun. They just worked.

This was computing on a whole new level. I thought I knew how to program. I didn't know anything like this. I needed to find out whether there was any more of this magic, and exactly how it worked.

My prior objections to Lisp were suddenly far less relevant.
  • Slower than assembly? Maybe for table lookups, but who cares about something as mundane as that? I want to do magic. If I have to look up something in a table, maybe I'll use assembly code.
  • Harder to understand? No way. Look at the derivative example. Five lines of code to express the basis of differential calculus. Five lines of assembly code can barely call a subroutine. (As the proud owner of an HP calculator, I had no problem whatsoever with prefix notation.)
  • Less efficient? I really didn't know, but it was clear that someone had put some serious thought into it and found some interesting solutions.
  • Less capable? Dude, it's doing calculus!
Well, the initial lecture in computer science was a lot more fun and interesting than I expected, but of course I knew they were trying to ‘sell’ comp. sci. as a career. I was pretty sure that this example was a hook and that the tedium would soon follow, perhaps even in the next lecture.

32 comments:

Clay said...

This line of blogging is really interesting. Please keep it going.

Clay said...
This comment has been removed by the author.
Buddha Buck said...

No one has commented on this,or the last post, yet.

I just wanted to say that I'm enjoying this story.

I find your description of how you felt about various languages then, and the "aha!" moments when all of a sudden you see them differently to be familiar and well-expressed. I am likely to refer people to it to see why people can fall in love with programming languages.

Tim Faircloth said...

If you know a bit about programming, you might have expected that each iterative call, like each recursive call, would consume a small amount of memory and eventually run out. Our system is smart enough to notice the iteration and avoid doing that.

Tail Recursion is actually supported by several languages.

Ramon Leon said...

Keep up the posts, I'm enjoying them very much.

Tac-Tics said...

The sad reality.

Tail recursion is a pain in the butt to debug. It turns out while creating stack frames is slow, it's really USEFUL when stepping through a program. If a procedure tail-recurs to another, the stack trace shows a miracle has happened: a procedure is being called from the one right above it, but the one above it never even mentions it. Those kinds of miracles are Bad News.

On top of that, being forced to create a function inside a function is not as natural. It is recursive, so it needs to be given a name, but the name is always something lame like "iter" or "_fact" or something dumb. In practice for-loop style constructs are more visible and easier to follow.

First-class functions are very important. However, that particular example isn't a good one. You can do the same in assmebly or C, albeit the syntax in C is gimped. The real power of first-class functions in Lisp comes from the ability to close over local variables in the outer scope.

Robert said...

I too am enjoying this line of posts. Thanks for writing them, Joe.

Matt M said...

@nova82

you need to imagine this in the proper context. in 1983, most of these high level programming languages that you're saying support tail recursion did not even exist. it was pioneered by Lisp.

Unknown said...

Here is the course.

Peter Eddy said...

@Tic-Tacks: I've never had an issue debugging tail recursive functions, or any recursive function actually. Certainly nothing worthy of the "Sad reality" tone.

And let me add a 'me-too' for the enjoyable article.

Jason Kikel said...

This is a great series.

For me, I appreciate lisp's facilities, but have never been a lover of its syntax. I much prefer:

let der f =
let dx = 0.00001
let f' x = ( f(x+dx) - f(x) ) / dx
f'

(i can't put a pre or code tag in this post so sorry about the lack of spaces)

Chris said...

It turns out that it isn't necessary for a recursive function to be named globally: http://en.wikipedia.org/wiki/Anonymous_recursion

Eli Sarver said...

I think this applies to my stab at learning haskell, as well as my struggles to understand the foundation of Calculus in preparation for taking a my first course in it. This also caused a bit of an "aha!" moment for me, even if by proxy.

Harold Fowler said...

at actually makes pretty good sense once you think about it.

RT
www.privacy-center.pro.tc

Evan Beard said...

Wow, very interesting and informative, especially from the perspective of a current cs undergrad. Keep it up..

Mike Kramlich / ZodLogic said...

great post!
you've further increased my itch to play with Lisp again.
This and Clojure.

Mike
my Zombie Rogue-like: http://deadbyzombie.com/
me: http://zodlogic.webfactional.com/mikekramlichsoft/

Tony Morris said...

"On top of that, being forced to create a function inside a function is not as natural. It is recursive, so it needs to be given a name, but the name is always something lame like "iter" or "_fact" or something dumb."

This is not true. Take a look at the Y-combinator. Or if you prefer Haskell, the fix function.

Ian Spivey said...

I love how you've put this in context. It's great to recapture some of the "wow"-factor Scheme must have conveyed back when I was just in the womb :) By the time I made it to 6.001, it was still elegant, but certainly wasn't quite as special.

Anonymous said...

Hey you can do this in python too!

dx = .000001

def deriv(f):
def fprime(x):
return (f(x+dx) - f(x))/dx
return fprime

def cube(x):
return x**3

def printDeriv(x):
print (deriv(cube))(x)
print 3*x**2

printDeriv(5)

wartalker said...

Great!!!

madsravn said...

Really liked this post. Hope you have more to come :)

Unknown said...

Enjoyed your post! Cheers!

hacksoncode said...

I realize this isn't really in the spirit of the article, but just to make a tiny little point:


#define DX 0.0001

double cube(double d) { return d*d*d; }

double deriv(double (*pfn)(double), double d) { return ((*pfn)(d + DX) - (*pfn)(d))/DX; }

double cubeDeriv(double d) { return deriv(cube, d); }
--

Andrei Korostelev said...

> I realize this isn't really in the spirit of the article, but just to make a tiny little point:
[...]

Note that this C implementation conceptually differs from the one on Lisp and Python. The difference is that in C you must supply both the function and the value to calculate the derivative in. In Lisp/Python you only specify the function, leaving the second binding free. Then you get a new function (der), which you can later feed with any value. This is what actually closures do.

hacksoncode said...

In Lisp/Python you only specify the function, leaving the second binding free.

I'm not sure how this differs from the cubeDeriv(double d) function I'm left with at the end of my example.

If I want to, I can even define another function by passing it back to deriv() and get a function for the double derivative of cube.

The main difference is that in Lisp I can take that function and pass it anything that the underlying functions and operators are able to handle.

You can't easily do that in C, because it's strongly typed. In C++ can do it with templates, but the syntax is much messier.

I'm not saying there isn't a lot of good magic in Lisp. But this particular trick is just syntactic sugar and weak typing (which is not always a blessing).

Lucian said...

@Ray
It has nothing to do with static typing, just the lack of closures.

The exact same trick done by Scheme is trivial in Haskell, a much more "strongly" statically typed language than C.

Fred Ross said...

@Ray, it has nothing to do with dynamic typing (which is what you mean). It works just fine in Haskell, which is statically typed. Actually, C is weakly typed, and in practice ends up being dynamically typed part of the time, such as for using functions like qsort.

And it isn't actually equivalent (which is what you mean by syntactic sugar -- Scheme's syntactic sugar is around quote and backtick, not functions). Yes, you can compile the Scheme form in this case by hand into C. That's what Turing completeness means. However, how would you handle the derivatives of arbitrary functions specified by the user of the program? You end up writing an interpreter. It may be a very simple one, but it's there. Effectively, you have created a little language that has the capability in question. Scheme already has it.

Unknown said...

This was interesting to read... and I usually /hate/ lisp related posts. :)

I once wrote something to do symbolic derivatives in C++ - absolutely a pile of hacks around missing or poorly implemented language features which most functional languages have given much more priority to so the power of that example for me was quite strong.

jbaker said...

from a computer science/math double major.. lisp & calculus <33

Anonymous said...

Tac-Tics wrote: "Tail recursion is a pain in the butt to debug. It turns out while creating stack frames is slow, it's really USEFUL when stepping through a program."

That's why Lisp has TRACE. I've never had TCO cause me any trouble debugging. Lisp debuggers are far more powerful than any non-Lisp debugger I've ever seen. You can even modify code at runtime.

Being able to see the execution path of one's program is generally useful, yes, but a stack trace only provides one snapshot of that, and you have to know ahead of time (via a breakpoint or trap) what point to catch it in the act.

"It is recursive, so it needs to be given a name, but the name is always something lame..."

The goal here was simply to demonstrate TCO to first-timers, and giving it a name was the easiest way to do that. There are ways to recurse into a lambda. Or it could have been a variadic function. There are many options. In other Lisp dialects, it's even easier, like Clojure's RECUR.

I believe these sorts of nit-picky issues are precisely what Hal meant when he said some programmers would need to "unlearn" first.

xX0v0Xx said...

Emacs the famous is lisp based. I loved using it for more than 30 years. Not sure any other language would have be able to make it so robust and efficient.

Anonymous said...

The plumbing is nice, but this isn't really calculus. It just wraps the supplied function in the derivative operator.

Much more impressive would be if it could return a simplified function:

(* 3 x x)