Abstract Heresies: March 2009

Monday, March 30, 2009

Following John Soule

I'm returning to the wrong coast today (barring drama). Back to the usual grind. But I intend to keep posting and my New Years resolution is to hack more Lisp.

Oh, and John Soule? Horace Greeley often quoted him.

Sunday, March 29, 2009

Lisp Conference

I think I said “Hi” to Zach Beane, but I didn't get to chat. I have to recommend wigflip as a source of fun graphics.

Friday, March 27, 2009

Wise up. Flush Scheme.

I had a couple of conversations with Alan Bawden. I lobbied (unsuccessfully) to get a macro writing license from him. I didn't advance my case by mentioning that I had written a Scheme interpreter using syntax-rules. Alan is not a fan of syntax-rules.

I agree with Alan that implementing any sort of complex macro with syntax-rules is like forcing a program through a straw, but the alternatives are unpleasant, too. I like the fact that syntax-rules can be expanded without the need to invoke the interpreter, so phase mixing is impossible.

After Olin Shivers's talk about his loop macro, it occurred to me that Alan's observation about first-class macros having types (Clams got legs!) might be relevant. When a macro gets somewhat complex, it has to code walk its arguments. It needs to be able to distinguish between chunks of code and binding constructs. When a macro gets very complex (like Olin's loop macro), it becomes impossible to code walk. (Of course you can expand the inner macro away, but there are advantages to keeping it in its original form.) This makes it difficult to use moderately complex macros. I was thinking that Alan's ideas of marking the types of a macro could help during the code walking.

Lisp conference - Hal Abelson

Hal was co-lecturer of 6.001 when I took it in 1983. I've already blogged about how he made me reconsider my opinion of Lisp by pre-empting my objections about efficiency and getting the machine to do calculus on day one.

I was also amazed by Hal's book Turtle Geometry. Hal was involved in the development of Logo for the Apple II computer. I had heard of Logo as a `kid's language', but I discovered that it was a bit more interesting than that when I flipped through Hal's book. The first chapter involved the pretty pictures you could generate through simple iterative and recursive programs, but the later chapters involve vector analysis, topology, linearity, symmetry groups, invariance, and differential geometry. The final chapter introduces General Relativity.

Hal has a way of taking very complex ideas, extracting the core concepts, and reframing them in an intuitive and easy to understand manner. I'm excited that Hal is doing a sabbatical at Google and I hope I can help him out.

Thank you

I'd like to say “Thank you!” to everyone that attended my talk. I really appreciate the positive feedback!

I joined the Speech Team when I was in high school. I had a crush on this girl and that might have been a factor. I was terrible at it. At every competition I came in dead last. At one competition, I just dropped out because I couldn't think of anything to say on a subject.

I'm pretty shy and introverted, but I like meeting people. I discovered a trick: I can pretend that I'm outgoing, even though I'm not, and it works just as well. For some reason, this obvious pretense is sufficient to fool my subconciousness.

Usually there is a rather painful backlash a few hours later when I'm sure I've made a perfect ass of myself. That's when I swear I'll never again put myself through the ordeal of public speaking. But this time I got such support that I don't feel like a total idiot. Thanks to everyone!

I made a list of the people I talked to at the conference. Part of the reason was so I could remember what it was we talked about and follow up on it here. The next few posts are going to be on that, I think.

Thursday, March 26, 2009

Name Dropping

There were a lot of people at ILC '09. I'm posting a list of the ones that I can remember (I know I forgot the names of at least three people, so if you talked to me, please email me).

These are people I saw, but didn't get the chance to talk to.

I got the opportunity to talk to these people.

Tuesday, March 24, 2009

A little bit of time

I have a small amount of time this morning before the conference to write a paragraph or two.

The panel discussion on the future of Lisp was interesting. Not so much for what was said, but from the tangents that couldn't be pursued because of time constraints. I enjoyed hearing Pascal Costanza give his ‘radical’ opinions: Java sucks. Stop moaning about how you can't use Lisp and just use it. (I'm paraphrasing, but only slightly.)

Gerry Sussman raised a great point that sailed a bit over my head. He thinks that the one thing that Lisp will be remembered for is how it can be used as a formal description language for processes. He spoke to me later and said he was surprised no one followed up on that. I asked ‘What is there to say? I agree completely, so I can't really argue or add to that.’ He asked ‘What is it about other languages that make them more popular? Can they express certain ideas better? Can we learn from them?’ I think that is a point worth exploring.

It also seems that Rick Greenblatt isn't mad at me anymore. We had a bit of friction that I'll detail later on in my story about LMI. Rick is an interesting guy, but like many brilliant people, he can be difficult at times.

I'm making a few notes and I'm hoping to expand on them later.

Monday, March 23, 2009

Over with

Finally, gave the talk. I had to rush a few things because I gave too much detail at the beginning, but it seemed to be generally well received. Now I can spend a bit of time blogging about the other talks.

Just about ready

I put some slides together for my talk last night. The stupid program was kind enough to reformat my code.

Sunday, March 22, 2009

At the conference

I made it to the conference. Lots of familiar faces here.

But where are the BABES?!?!

Saturday, March 21, 2009

On the airplane heading east for ILC 09. I like having traveled, but I don't like the travel itself. I'm looking forward to ILC, but not to my presentation on Monday. I really dislike public speaking, and every time I do it I swear that it will be the last time. I figure, though, that at least a few people will be interested in what I'm saying and I can answer some questions. Fortunately, Gerry Sussman and Alexy Radul are presenting a short version of their propagator talk right after me. I can probably donate some of my time to them.

So back to LMI and my microcode tracer.

I quickly discovered some problems. The first was an issue with lexical scoping. The Lisp machine compiler would ‘flatten’ the lexical environment in a procedure so a variable could be located by a simple vector ref. But I found that when I had two separate variables with the same name, they would end up getting confused with each other. I wrote some demo code that exhibited the bug.


(defvar *loser-1*)
(defvar *loser-2*)

(defun foo-loser (a b)
  (let ((a 'local-a))
    (setq *loser-1*
          (lambda () (format t "~%A = ~S, B = ~S" a b))))
  (let ((b 'local-b))
    (setq *loser-2*
          (lambda () (format t "~%A = ~S, B = ~S" a b)))))

The foo-loser program has two arguments named a and b, and two locals also named a and b. The global variables *loser-1* and *loser-2* are each set to a thunk that prints the value of a and b that the thunk has closed over. *loser-1* ought to be closed over the local variable a and the argument b, while *loser-2* ought to be closed over argument a and local variable b. The expected behavior is this:

> (foo-loser 'argument-a 'argument-b)

> (funcall *loser-1*) 
A = LOCAL-A, B = ARGUMENT-B

> (funcall *loser-2*)
A = ARGUMENT-A, B = LOCAL-B

The actual behavior was this:

> (funcall *loser-1*) 
A = LOCAL-A, B = LOCAL-B

> (funcall *loser-2*)
A = LOCAL-A, B = LOCAL-B

The local variables bound by the LET expressions are visible to both thunks.

I showed this bug to RG and he explained that lexical scoping had been a relatively recent addition to the compiler and that few programs made enough use of it for the bug to have surfaced. He also suggested that the fastest, surest way to get it fixed was look at the compiler and fix it myself.

I had a theory about the source of the problem. There was a table that mapped the variable name to its position in the environment. Clearly when the local variables were added to the table, it caused ambiguity when looking up the names. I was guessing that the table was implemented by an association list. This would be a simple, obvious mechanism and later additions to the table would shadow the earlier ones resulting in the observed bug. So looked in QCP1.LISP to find out what was going on.

This text appears at the beginning of the Lisp machine compiler:

;; "This is insane. What we clearly want to do is not completely
;; clear, and is rooted in NCOMPLR." -- BSG/Dissociated Press.

I was Dante on the threshold of Dis.

The Lisp machine compiler is not re-entrant. If an internal function is encountered while compiling a top-level function, it is put on a queue to be compiled later. These are called breakoff functions. With dynamic scoping, there is no need to keep compiler state from the lexically enclosing function when compiling a breakoff function, but with lexical scoping, the breakoff function has to know where the compiler put the variables in the enclosing function. But the compiler doesn't decide this until well after the breakoff has been enqueued. So when the compiler finally gets around to deciding where the variables are going to be, it looks in the compiler-queue to see if any of the breakoff functions need to know the locations. But this is where the bug comes up.

The compiler has a data structures that represents the variables. The lexical environment is simply a collection of these variables. For the example code, the compiler knows about these variables:

[name: a] ;; argument
[name: b] ;; argument
[name: a] ;; local
[name: b] ;; local

In the example above, the first breakoff function is in this SETQ:

(setq *loser-1*
      (lambda () (format t "~%A = ~S, B = ~S" a b)))

When we get to the point of compiling the reference to variables A and B, we no longer know which A and B we want.

The problem is obvious, but coming up with a fix that doesn't involve a serious amount of rewriting of the compiler is not obvious. The strategy I settled on was a bit nasty. When the breakoff function is enqueued, I record a list of which variables are visible. The positions of the variables in the stack frame are not yet known, but which variables shadow which other variables is known. This information gets stored along with the breakout function. Later on, when we find the positions of the variables and walk the compiler-queue to tell the pending breakoffs about them, we do a little trick. Since we know the actual variable records we ought to be able to see, we rename the variables we shouldn't see to a unique name. (We can't just remove the non-visible ones because the location in the list defines the offset in the frame.)

[name: a] --> [name: #G:123] ;; argument (shadowed)
[name: b] [name: b] ;; argument (visible)
[name: a] [name: a] ;; local (visible)
[name: b] --> [name: #G:124] ;; not in scope (not visible)

Now when we can look up the variables by name without finding the wrong one.

This trick works, but it isn't pretty. I was not very happy with this solution, but the alternatives seemed to require quite a bit more understanding of the existing code. I came back to RG with the fix. To my surprise he approved of the change and told me to check it in.

The first problem was solved, but more bugs soon appeared...

Thursday, March 19, 2009

Talk to Greenblatt

So I called up Gerry Sussman and told him I needed a job. I was hoping, of course, that he would suggest some part-time work at the AI lab. Instead, he suggested that I get in touch with Rick Greenblatt over at Lisp Machine, Inc. A few days later Rick invited me to come up and see him at LMI. He told me that Gerry thought I could be useful and started me off with the microcode tracer.

There's nothing like diving right into something you know absolutely nothing about.

The Lisp machine microcode is a fairly insane chunk of software. There was no real documentation for it. The functionality is pretty straightforward if you are familiar with the processor architecture, and only a handful of hackers had ever written any substantial amount. If you wanted to hack the microcode, you just had to ask Moon or RG to show you and tell you about the coding conventions.

The microcode became a pile of spaghetti over the years, and it was getting harder and harder to figure out what was going on. The microcode had no scoping constructs. Every variable, temporary, bit field, chunk of state, or whatever had to live in a register. That was ok because there were thousands of registers. This is sort of like having thousands of global variables. Some of them even had names, and some of the named ones even had useful names. On occasion, there'd be some documentation that would tell you things like

;SUBROUTINE TO SELECT SHEET POPPED FROM PDL
;SMASHES M-A, M-B, M-C, M-D, M-E, M-Q, M-S, M-1
;ONLY REALLY GUARANTEED TO PRESERVE M-I, M-K, M-ZR

The microcode tracer was an attempt to impose a bit of order on this chaos. The idea was to add declarations to the microcode at the entry to basic blocks and have a program verify that the declarations are correct. An error would be raised if some path through the code clobbered a register that was not explicitly listed. Rick had written a bit of code for the microcode tracer, so he wanted me to expand on it. Rick's code is here http://code.google.com/p/jrm-code-project/source/browse/trunk/lambda/rg/old-micro-trace.lisp

To tell the truth, I'm not sure how well I understood what I was doing. I was sure that I didn't understand what Rick's code was trying to do, so after puzzling over it for a few days, I started writing my own version. I soon discovered problems...

I like debugging

Ok, I should put some caveats in here. The last time I liked or hated something I ended up as a foe of well-designed data structures. No doubt I'll be the champion of buggy code next.

No, I like a particular aspect of debugging. Call it forensic programming. Most bugs are not very interesting. They stem from simple oversight (“Oh, I didn't think of that...”), the fix is usually fairly obvious, and there is no lesson to learn other than to be thorough. The interesting bugs are the ones that stop you in your tracks, make you scratch your head, and cause you to think “What the ... ?”

To find and fix one of the interesting bugs is to attempt to re-create the state of mind of the author of the program. No one writes a program with the intent of failing. The program is designed to succeed and the author clearly thought he was succeeding. To understand how the program works involves seeing the author's plan for success. You know the plan ultimately fails, but rather than designing a new and different plan for success, you have to pretend and persuade yourself that a reasonable, rational, and fairly intelligent person could plausibly think the existing code was more than sufficient for the problem at hand. At the same time, however, you have to consider numerous failure modes that are serious enough to cause the observed failure, yet subtle enough to not fail on the observed successes. There are a lot of ways for a program to simply not work. There are far fewer ways for a program to mostly work. Presumably, the program worked well enough to fool the author.

Debugging is narrative. You are telling a story. The theme is classic: man versus nature. Our protagonist is the intrepid engineer. There are mountains of data, fast-moving streams of input and output, abstraction barriers that conceal too much or too little, and the constant pressure of time for him to contend with. The antagonist is the devil hiding in the details. He cannot win a direct confrontation. His goal is more subtle. He is Beaudelaire's devil attempting to convince the world he doesn't exist. Act I finishes with the apparent triumph of the programmer. He has not simply reached his destination, he has routed the devil from all his hiding places. But the audience knows there is something wrong.

Act II opens. We know the engineers intentions and his designs are plain to us. We turn to the devil. The devil's plan is to precisely engineer the observed failure mode without affecting the outcome of the observed successes. The devil is not an engineer. He writes no code of his own. He doesn't seem evil. He's supportive! He's flattering! What a fantastic design! Your code is near perfect --- no, it is perfect. Not that there aren't limits, of course, there are always limits. But don't worry, you aren't near them. And special cases, too. But those are rare. Negligable, almost.

But the drama lies in the audience seeing the devil for what he is, all the while knowing that the engineer is completely fooled. As with any good drama, the engineer and the devil are opponents worthy of each other. The design is in fact admirable, and the devil has his job cut out for him. Or perhaps it is good comedy where the simultaneous presence of not one, not two, but all three stooges acting in concert is necessary to cause an increasingly improbably chain of events to lead to disaster.

Act III is the denoument. The bug is revealed and Poirot walks us through the deductive process of his little grey cells. It can end here with a simple, obvious bug fix or workaround, or we can leave the possibility open for a sequel where we need to completely redesign the original work.

There's a nice, fat, easy target for the internetsia to skewer. Have at it.

Wednesday, March 18, 2009

Fast-forward to 1985

I had a relapse in those health problems and was back at home during the summer. It was boring. Very boring. I really wanted to hack, but the only machine I had was my TRS-80. So I wrote a Scheme interpreter in Z80 assembly code.

I have to recommend writing an interpreter (or a compiler and runtime system) as a great way to learn a whole bunch of practical programming. It's especially interesting if the language you implement is a mismatch for the underlying system. The Z80 is an 8/16 bit processor, and I hacked my TRS-80 to install a full 64K of RAM. I decided on a tagged pointer architecture with 8-bit tags and 16-bit pointers. This gave me a 24-bit word size.

The other unusual thing I did was with the number system. I didn't want to implement floating-point, but I also didn't want integers only, so I used fixed-point arithmetic with six digits on each side of the decimal point. I didn't want to convert from binary fractions to decimal fractions, so I did the math in BCD.

One of the weird instructions on the Z80 was one that rotated by 4-bits the contents of the A register and the contents of a memory location. I never found a use for it until I did the BCD math and then it was obvious. It rotates a single BCD digit out of a packed BCD representation in memory so you can do multi-precision adds and subtracts.

I soon ran out of memory with the assembler. There was no way to assemble separate modules, so I made a table that contained the external entry points for the module and duplicated that across the other modules. The project grew until I had seven separate modules. The development cycle was like this:

After editing a module, save the assembly source code to a source casette. Be sure to write at least three copies.
Compile the module to machine code. Dump the code to an object casette. Again, three copies.
Reboot, load the debugger.
Attempt to load all the object modules. Sometimes this would work ok, other times it would hang. If it hanged, you had to reboot and start loading again.
Run some code. When it crashed, debug the machine code. (The source code isn't available at this point, so you have to take notes and try to figure out what was supposed to be happening.)
Reboot and load the editor.
Load the source code for the buggy module and try to find the bug. Once you think you have it fixed, go back to the beginning.

Extra step:

If you need to add a cross-module link, you have to edit the link table. This was kept track of in a notebook and a copy was made to each module. If you changed the link table, you had to update it in all seven modules, so it was usually a good idea to update these in batches.

The round-trip time near the end of the project was about 40 minutes.

During these 40 minute cycles I had a lot of time to think while mindlessly shuffling cassette tapes. I eventually decided that this was largely a waste of time and was becoming less fun with every line of code I added. I also thought maybe I should get a job. I wasn't sure I could get one, but I figured that maybe someone would be willing to throw a little part-time work my way. I eventually worked up the courage to call Gerry Sussman and see if he knew of anyone that would find my summer project the least bit interesting.

Tuesday, March 17, 2009

ILC 09 is a go

It looked a little iffy for a while, but it seems that I'll be at ILC '09. Hope to see a lot of old friends and meet new ones there.

I'm currently thinking about abstract syntax trees and programming. I think I'll have a few things to say in a day or two.

Thursday, March 12, 2009

There's not much more to say about '001. By the end of the course I was definitely decided on becoming a computer hacker. (I decided not to pursue electrical engineering once I got to signals and systems. That's some nasty math.) I was a real fan of Scheme, but my exposure to other Lisps were fairly minimal. Anyway.... once I get over this flu I'll try to write more frequently, but the topics will be random.

Sunday, March 8, 2009

A few speed bumps

It's a bit hard to recall the process by which I changed my mind about Lisp. I remember the first couple of '001 lectures because they directly contradicted what I thought I knew. I didn't become an immediate lover of Lisp by any means, but my antipathy was much weaker. There are a few incidents that do stand out in my memory. The first one was LET expressions.

I remembered from my first encounter with Lisp that there were these things called ‘lambda-expressions’. A lambda expression allowed you to make a temporary name for some quantity. Suppose you had some code that had a common subexpression like this:

   (/ (+ (* x x) 2) (* x x))

and you wanted to pull out the common subexpression, you'd do this:

   ((lambda (temp) (/ (+ temp 2) temp)) (* x x))

I admit that that is pretty weird, but for some reason I was able to remember it.

Of course assigning names to certain subexpressions is something you do all the time in any computer language. While it is nice to know that you can do this with a simple function call, an embedded lambda expression like this is not the clearest way to express it. The values of the bound variables are too far removed from their names.

Lisp's LET expression is a very handy bit of syntax that fixes that problem. It looks like this:

     (let ((temp (* x x)))
       (/ (+ temp 2) temp))

But I found this to be a bit confusing. There seem to be too many parenthesis here. And there is something irregular going on because if you bind two variables, it can look like this:

    (let ((count 22) (tail elements))
      (if (null? tail)
           ....

Which seems to have too few parenthesis at first glance. Or maybe three variables looks like this:

    (let ((ok '())
          (count 1)
          (tail (cdr elements)))
      (if (null? tail) 
          ...

Which seems to have both problems.

It wasn't obvious to me what was going on. My brain kept telling me that when I saw a pair of open parenthesis I should expect a pair of close parenthesis at the other end. That's simply not true. I learned to parse LET expressions sooner than I learned to write them. I'd write out the lambda expression first and then carefully transform it into the appropriate LET expression.

It took me a bit of practice (not much, maybe a few days worth) before I was comfortable with LET expressions. Even these days I will on occasion use non-standard code formatting to reduce ambiguity. (There is a balance between parsimony of expression and ambiguity. A larger Hamming distance between legitimate programs will help protect you from typos, but it will also make your program text quite a bit more verbose.)

It can be tricky getting the right number of parenthesis. The editor can help balance them, but it doesn't help if you have correctly balanced an incorrect number of parens. Sometimes this leads to a syntax error, but sometimes it leads to syntactically correct program that bombs out. The usual error is something like ‘The object 3 is not applicable.’ If you wrap an extra set of parenthesis around a function that you intend to pass as a first-class object, you'll end up invoking the function and passing the result instead. Then somewhere later on when you think you are invoking a function, you'll get that error. The problem isn't just the presence of extra parenthesis; it is the fact that the extra parenthesis may indicate a perfectly valid program (one with an extra level of function invocation) that isn't the one you thought you wrote.

The second speed bump was when we got to compound data. I had seen linked lists in the prior course, and I'd written down box and pointer diagrams before, so I had a tiny head start. Despite this, it was not particularly easy. It's hard to remember when to use CONS, LIST, or APPEND, or whether you want a quoted list of literals or a list of quoted literals. Sometimes I admit I just flailed around and tried the different combinations until it worked.

A particular problem stands out in my mind. We were doing arithmetic on univariate polynomials with integer coefficients. Each polynomial had a `term list' and each term had a coefficient and an ‘order’. So the polynomialx^3 + 3x^2 - 7 would be represented like this: ((1 3) (3 2) (7 0))

The problem set already had a skeleton for the division program:

(define (div-terms L1 L2)
  (if (empty-termlist? L1)
      (list (the-empty-termlist) (the-empty-termlist))
      (let ((t1 (first-term L1))
            (t2 (first-term L2)))
        (if (> (order t2) (order t1))
            (list (the-empty-termlist) L1)
            (let ((new-c (div (coeff t1) (coeff t2)))
                  (new-o (- (order t1) (order t2))))
              (let ((rest-of-result
                       <compute rest of result recursively>
                       ))
                 <form complete result>
                 ))))))

This isn't a particularly hard problem, but it has a subtle pitfall. The function really wants to return two values, a quotient and a remainder. Instead, it returns a single value, a list of a quotient and remainder. The caller will need to extract those two components in order to use them. Now since this function is recursive, it is itself a caller and therefore has to destructure its own answer. So where we see this code:

              (let ((rest-of-result
                       <compute rest of result recursively>
                       ))
                 <form complete result>
                 ))))))

We'll have to do something like this:

   (let ((rest-of-result
            <compute rest of result recursively>
            ))
     (let ((rest-quotient  (first rest-of-result))
           (rest-remainder (second rest-of-result)))
       (list 
           <form complete quotient>
           <form complete remainder>
           ))))))))

Sticking a single term on to the front of a term-list is done by calling CONS. Creating a term from a coefficient and order is done by calling MAKE-TERM which is essentially an alias for LIST.

An experienced coder would already notice a bunch of red flags by now. This is starting to get confusing and it would be a good idea to be real careful. Unfortunately, at that time I was an inexperienced programmer. I started in a confused state and didn't know how much more confusing it could get, and I was already being as careful as I could be.

So what are the potential pitfalls? The most difficult one to deal with is incorrectly destructuring the recursive result. If instead of using FIRST and SECOND you used CAR and CDR, you'd get the quotient correctly, but rather than the remainder you'd get a list containing the remainder. At each recursive step this list would get deeper and deeper and you end up with some very deeply nested list structure.

The second pitfall is to CONS the quotient and remainder together rather than to LIST them. Both produce list structure, but the other places where answers are returned use LIST, and the problem statement says that a list is returned. If you CONS the quotient and remainder together, you'll have one of two problems. If you have correctly used FIRST and SECOND to destructure the recursive call, then the cons cell you generated will end up being destructured ‘too far’. Instead of a remainder that is a list of terms, you'll end up with a remainder that is a single term dropping the other terms. You'll probably notice the missing terms when you finally get a result, but you may not know where they disappeared.

On the other hand, if you incorrectly use CONS to construct the recursive result, and incorrectly used CAR and CDR to destructure it on the recursive call, nearly everything will ‘work’. Of course you will not have satisfied the problem statement that the result is a list of quotient and remainder, but it will be made from ‘list structure’. But the innermost remainder will have been produced by one of the other clauses that return answers. Since these use LIST, the innermost remainder will have one extra level of list structure.

Have your eyes glazed over yet? This is a mess, and I can see why it is a stumbling block. There are some obvious ways to fix it within Lisp (using structures, self-descriptive data, and predicates to ensure the answers are well-formed) or outside Lisp (using a language with static annotations to get the compiler to prove that the answer is well-formed), but that's not my point. List structure can be confusing, and you have to trade the convenience of just consing up a data structure versus the complexity of debugging the code.

It seems to me that this particular problem set could have been made an awful lot easier by first having the students write some defensive code that checked its arguments and such, but I'm not sure if the point of the exercise is to either discover this yourself, or to force you to think a bit harder about what you are doing (it really only takes an extra few minutes to reason things out at the level of detail you need), or just to get you used to the kind of nested list structure you are likely to find in legacy Lisp programs. Or maybe I'm singling out this particular problem set because it was tricky for me.

Saturday, March 7, 2009

Writing blog entries is hard. I'm trying to remember how it was that I ended up being a Lisp hacker when I started out disliking the language and somewhat uninterested in computers. I'm also trying to understand why many, if not most of my classmates found '001 difficult, and Lisp to be confusing, annoying, and exasperating. And I'm trying to spin this into a somewhat coherent story.

I think I'm going to relax the coherence, though. I'm ending up with a pile of blog fragments that I can't seem to organize, but I don't want to lose them, either. Fortunately, this is a blog and no one expects particularly high editorial standards, so I'll just post some of these.

There were several great lectures in 6.001, but that first one was the most memorable because it showed me that real computer hackers weren't writing accounts receivable and inventory reporting programs but were having an awful lot of fun making machines do amazing things. I figured out that the more I learned, the more fun it would be. Each lecture re-confirmed that. I was hooked. I couldn't wait for the homework assignments because then I could get the computer to do amazing things, too. This had the added benefit that the lab was generally empty so there would be no wait for machines and I could spend extra time experimenting after finishing the problem sets.

(The recitations and tutorials, however, were a different story. Recitations were smaller classes of about 30 or so people where a graduate student would go over the material covered in the lectures and in the homework in much more detail. The tutorials were small sessions where a graduate student would give 4 or 5 students very detailed instruction and help on homework. The recitations were uninteresting, but it gave me a chance to make sure I understood the lectures and didn't miss any important points. The tutorials were deathly dull and I stopped going to those.)

One thing I enjoyed about the lectures was that I'd often find myself reconsidering math and physics from a very different viewpoint. The lectures and the text would frequently start with a very naive and unsophisticated model of a problem and then spend time refining it. On more than one occasion I'd get new insights into things I thought I already knew.

A couple of examples come from the second lecture. We were going to teach the computer how to take square roots. I had learned two ways to (manually) take square roots in high school. The first was a horribly complex process that was somewhat like long division, but was quite a bit harder. The second was to use a table of square roots and interpolate. I assumed that the table was laboriously generated the long way and that we were going to implement the long way on the computer. Instead we used Newton's method.

I remembered Newton's method from calculus, but it didn't occur to me that it would be a practical technique. Part of it was that Newton's method didn't give you the answer to the problem, but rather an approximation to the answer. I didn't think approximations would be acceptable. But the main point wasn't that we were using a particular method to solve a particular problem, but that we if we didn't know a good way to get the answer to a problem, we could try taking a random guess at an answer and get the computer to keep refining this approximation until we got an answer that was good enough.

The second example came from a demonstration of procedural abstraction. The lecturer (it was Hal or Bill Siebert, I can't remember) pointed out that many of the programs we had written had similar structure. I was expecting to be told about the ‘Symmetry Principle: Programs should be symmetric and uniform when performing similar functions, unless there is good reason to the contrary.’ Instead, the lecturer wrote an abstract version of the program where the varying parts were replaced by names. Then the several programs from before were easily changed into simple calls to the abstract version. The ‘symmetry principle’, which was so important to the previous computer course, was suddenly irrelevant. You'd never have programs performing similar functions if you simply abstracted out the similarity.

The lecturer went on to show how the Sigma notation from mathematics was just a way to abstract out repeated addition (no great revelation there, but a nice concise explanation). He then went on to show that once you had an abstraction like this, you could use it as a building block for other abstractions. He wrote an approximation of the definite integral using the summation procedure he just defined. I was impressed. I wasn't as blown away as I was on that first day, but still, in two days we covered the main points of an entire semester of calculus.

But it also helped me understand something that I didn't understand when I took calculus. My calculus teacher made a point of always constructing a Riemann sum before writing the integral equation. I didn't understand the point of this extra step because you aren't doing anything much more than just replacing the sigma with the big curly S --- what's the difference? In the code example, however, it was clear that we were computing a sum and the value of the sum depended upon how finely we divided up the elements. The more pieces, the smaller each piece, the more accurate the answer, and the longer it took to get the answer. Going from a Riemann sum to a definite integral involved taking the limit as dx goes to zero. This is an abstract process in mathematics, but a concrete one given our computer program. It was hard to see what could go wrong when replacing the letter sigma with the big curly S, but it is pretty easy to see that the program might not converge if the function being integrated is suitably perverse. What seemed like a pointless rewrite in the calculus course became a condition that had to be satisfied in order to run the program.

Thursday, March 5, 2009

Not Lisp again....

It was 1983 and I was taking my first real computer course at MIT. Hal Abelson and Bill Siebert were co-lecturers. Hal was delivering the bad news:

“If you already know how to program, you may be at a disadvantage because you will have to unlearn some bad habits.”

Wonderful. I knew how to program (after all, I knew Macro-11, Z80 assembly and BASIC), and I could guess what the bad habits were. Violating the un-cuteness principle was probably one of them. Nothing fun is going to be allowed. (I rarely take notes at lectures. I'm too busy keeping an internal running commentary.)

“In this course we will be using the programming language Lisp...”

Argh! Not that again! What is it with Lisp? Ok, maybe at Harvard they do that sort of thing, but this was MIT! Don't they hack computers here?

I was wondering what I had gotten myself in to. I held on to two small hopes. First, that the little amount of Lisp I had learned at Harvard would give me a head start over the other 250 students in the room. Second, that I only had to take a couple of computer classes to fulfill that part of the EECS requirement. I could stomach a bit of Lisp for a while, toe the company line, and be completely un-cute long enough to pass the course.

Hal put the archetypical Lisp program on the blackboard:

(define (fact x)
  (if (zero? x)
      1
      (* x (fact (- x 1)))))

He walked the class through the substitution model:

(fact 5)
(* 5 (fact 4))
(* 5 (* 4 (fact 3)))
(* 5 (* 4 (* 3 (fact 2))))
(* 5 (* 4 (* 3 (* 2 (fact 1)))))
(* 5 (* 4 (* 3 (* 2 (* 1 (fact 0))))))
(* 5 (* 4 (* 3 (* 2 (* 1 1)))))
(* 5 (* 4 (* 3 (* 2 1))))
(* 5 (* 4 (* 3 2)))
(* 5 (* 4 6))
(* 5 24)
120

“This is a recursive process. At each step in the recursion we break the problem into smaller parts, solve each part separately, then combine the answers. In this example, each recursive call to fact uses a smaller number. Eventually we get to fact 0 which we know is 1. At this point we can start the multiplication.”

And then he said “This isn't very efficient.”

I thought that was an odd thing to say about a Lisp program. At Harvard, the obvious inefficiency of Lisp was downplayed. Here the professor was pointing it out and was going to do something about it.

“Look how the size of the problem depends upon the value of X. When X is five, there are five pending multiplications that have to be kept track of. If X were ten, then there would be ten pending multiplications. The computer is going to have to use up resources to keep track of these.

“What we want to do is to keep track of the product of the numbers we've seen so far. Like this:

  (define (fact x)

    (define (iter n accum)
      (if (zero? n)
          accum
          (iter (- n 1) (* n accum))))

    (iter x 1))

  (fact 5)
  (iter 5 1)
  (iter 4 5)
  (iter 3 20)
  (iter 2 60)
  (iter 1 120)
  (iter 0 120)
  120

“This is an iterative process. Instead of breaking the problem into parts that we solve separately, we convert the problem into a simpler problem that has the same solution. The solution to (iter 5 1) is the same as the solution to (iter 4 5), which is the same as the solution to (iter 3 20) and so forth.

“Notice how in this case we aren't building a chain of pending multiplications. At each iteration, the value of n is being mutiplied by the accumulated product and this becomes the new accumulated product in the next iteration. When n reaches zero, we have in the accumulator the product of all numbers from n down to zero.

This was pretty obvious to me. However, I thought Hal was omitting an important point. Clearly each recursive call to iter had to be matched by a corresponding (implicit) return. Otherwise, how would the program know how to get back to the caller? Granted, there were no pending multiplications, but that doesn't mean we're not using up stack frames.

But then Hal said, “With an iterative process, the computer doesn't need to use up resources for each iteration. If you know a bit about programming, you might have expected that each iterative call, like each recursive call, would consume a small amount of memory and eventually run out. Our system is smart enough to notice the iteration and avoid doing that.”

Now that caught my attention. I was impressed first by the fact that whoever designed this particular Lisp system cared about efficiency. I was impressed second by the fact that he was able to program the computer to automatically figure it out. I was impressed third that however it was that it worked, it had to be really fast at it (or the cure would be worse than the problem).

Frankly, I had my doubts. I thought that maybe the computer was able to figure this out some of the time, or maybe it was able to substantially reduce, but not totally eliminate, stack allocation. Or maybe this was some sleight of hand: the resources are allocated somewhere else. I tried to figure out how I'd do this in assembly code, but I was baffled. I made a note to myself to check into this.

Hal went on to introduce first-class procedures. I didn't see the rationale for these. I mean, you can't call a function that hasn't been written, and if it has been written then why not just call it directly and avoid the run-around?

Hal said “You're all familiar with the derivative operator.” and he wrote it on the blackboard:

                               f (x + dx) - f (x)
       D f(x) = f'(x) = lim  --------------------
                                      dx

“We can program this as follows:

    (define dx .0001)

    (define (deriv f)
   
       (define (f-prime x)
         (/ (- (f (+ x dx)) (f x))
            dx))

       f-prime)

“Notice how our formula for the derivate translates directly into the lisp code.

“Let's take the derivative of the cube function:

    (define (cube x) (* x x x))

    (define cube-deriv (deriv cube))

“And this new function should act like the derivative of the cube function.”

He put up a slide on the overhead project that showed the above code and the output from the computer:

(cube-deriv 2)
;Value: 12.000600010022566

(cube-deriv 3)
;Value: 27.00090001006572

(cube-deriv 4)
;Value: 48.00120000993502

Sure enough, that's reasonably close to 3x^2.

I was floored. Here we take the simple, straightforward definition of the derivative of a function. Type it in with essentially no modification (we're a little cavalier about dropping the limit and just defining dx to be a ‘reasonably small number’) and suddenly the computer knew how to do calculus! This was serious magic. I had never seen anything like it before.

Hal went on to explain how the substitution model of evaluation worked in this example and I carefully watched. It seemed simple. It couldn't be that simple, though, could it? Something must be missing. I had to find out.

As soon as the lecture ended I ran to the Scheme lab. Hewlett Packard had donated several dozen HP 9836 Chipmunks to MIT. Each one was equiped with a blindingly fast 8-MHz 68000 processor. On top of each workstation sat a fairly sizable expansion box that held an additional 16 megabytes of RAM. A student had sole use of an entire machine while working on problem sets. No time-sharing! It was an incredibly lavish setup.

The lab was empty (go figure, everyone had just gotten out of the lecture). I carefully typed in the examples from the lecture and tried them out. I tried the recursive factorial and managed to blow out the stack with a fairly big number, so I knew there was indeed a stack limit and how the machine behaved when you hit it. I removed the multiply from the iterative factorial to avoid making huge numbers and gave it some ridiculously large argument. The machine churned and churned and eventually returned an answer. I gave it a bigger number. A lot more churning, but no sign of stack overflow. I guess that really worked.

Then I typed in the derivative example from the lecture. Sure enough it worked just like Hal said. I tried passing in a few other functions. They worked, too. I did the homework exercises for fun. They just worked.

This was computing on a whole new level. I thought I knew how to program. I didn't know anything like this. I needed to find out whether there was any more of this magic, and exactly how it worked.

My prior objections to Lisp were suddenly far less relevant.

Slower than assembly? Maybe for table lookups, but who cares about something as mundane as that? I want to do magic. If I have to look up something in a table, maybe I'll use assembly code.
Harder to understand? No way. Look at the derivative example. Five lines of code to express the basis of differential calculus. Five lines of assembly code can barely call a subroutine. (As the proud owner of an HP calculator, I had no problem whatsoever with prefix notation.)
Less efficient? I really didn't know, but it was clear that someone had put some serious thought into it and found some interesting solutions.
Less capable? Dude, it's doing calculus!

Well, the initial lecture in computer science was a lot more fun and interesting than I expected, but of course I knew they were trying to ‘sell’ comp. sci. as a career. I was pretty sure that this example was a hook and that the tedium would soon follow, perhaps even in the next lecture.

Wednesday, March 4, 2009

Computers? Ugh.

When I went to MIT I intended to major in Course 16. I wanted to design space ships and be an astronaut! I had some curiosity about the computers at MIT, and I took the computer tour, but I didn't find it that interesting. The 9th floor machine room at Tech Square had a raised floor and rows of big disk drives and cabinets with flashing lights, but if you've seen one PDP, you've pretty much seen them all.

We stopped by the SIPB office which seemed to be some sort of computer club. There were nerds there. Ok, I'm a nerd and there's no doubt about that, but compared to these guys I was an amateur. These nerds looked strange and acted stranger. I clearly recall one homunculus that had the physical grace of a special-needs student and an ego that had to be turned sideways to fit through the door. I pictured myself a year or two in the future. I'd be sitting in the SIPB lounge with these people talking about being a professional nerd, and this would be the high point of my day. That wasn't a happy thought.

I discovered something important my freshman year. One of my friends was a serious aero-astro junkie. He lived and breathed the stuff. He had models of nearly every airplane that ever flew in World War II and could tell you who designed it, the manufacturer, what sort of engine it had, how many were constructed, how many were destroyed, and he knew weird trivia about them. He'd say things like “oh, that one is XYZ-22a, they only built 3 of them. They didn't have the engine ready for the first one, so they retrofitted a Rolls-Royce J11 into it....” Not only did I not know this sort of stuff, I didn't care.

The first class in course 16 was Unified Engineering. Unified has a well-deserved reputation for being a true killer course. The course was so complex that gave it two names and entries in the course calendar: ‘Unified I and II’, but it really was only one course. It was nasty. My friend, who took unified his freshman year, was doing these problem sets that involved air flow, fluid dynamics, non-linear dynamics, avionics, and all sorts of really hairy math. I was not looking forward to that.

I gave it a bunch of thought and came to the conclusion that course 16 was for people like my friend and not for me. I could probably learn it, but I'd never have the passion he had for it. Every little detail about flying things was exciting to him. I decided to major in something else.

Physics seemed a good possibility. I really liked physics in high school, and I got a perfect score on the AP physics exam. Undergrads at the 'tute are required to take at least 1 term of physics. You can take the basic freshman physics to fulfill the requirement, or you can take advanced freshman physics if you intend to follow that career. I took the latter.

If you want to make God laugh, tell him your plans. I was blindsided by health problems my freshman year and nearly ended up dropping out of college. I couldn't complete the physics course. In hindsight, I probably should have taken a term or two off, but at the time I didn't see that as an option. A career as a physicist seemed off the table as well.

Well, I only ruled out a couple of courses, and there are a lot of them. I started thinking about electronics. I was always sort of interested in electronics. I had built a few things in high school and had a breadboard and soldering iron and I had done some mods to my TRS-80 to enable the expanded character set. I knew the resistor color code and a bit about diodes, transistors, and SCRs.

I wasn't too thrilled at the idea of the computer science part of course 6, though. The thought of databases and accounts receivable and VisiCalc left me cold. But I figured that I could plow through it. The computer course at Harvard wasn't as much fun as I'd hoped, but it wasn't very difficult.

In 1983 I signed up for 6.001 (Structure and Interpretation of Computer Programs) and 6.002 (Electrical Engineering).

Tuesday, March 3, 2009

I hate Lisp

It was 1981 and I was taking my first real computer course at Harvard. I was rapidly discovering that *real* programming sucked. Big time.

My parents had scraped up enough money to get me a TRS-80 Model 1 in high school. My friend Andy showed me how to write Z80 assembly code. It was so much more powerful than BASIC. It gave you direct access to the hardware and it was lightning fast. I had a lot of fun with it.

When I went to college, I decided to try computers and astronomy (those being my big interests at the time). The computer course was taught on a PDP-11 and we started off with Macro-11 assembly code. I remember being amazed that you could program in assembly code on time-shared machine and not bring the entire thing crashing down.

The textbook was “An introduction to computer programming and data structures using MACRO-11” by Harry Lewis. It seemed ok, but I didn't care for it's high-handed ‘principles of programming’ that were sprinkled throughout the book. For example,

Symmetry Principle: Programs should be symmetric and uniform when performing similar functions, unless there is good reason to the contrary. Extreme Value Principle: Programs should work correctly for all data on which they may operate.

I especially hated this one:

Un-Cuteness Principle: Don't be clever or cute in an attempt to save a few memory locations or a few microseconds.

Isn't that at least half the fun of assembly language programming?

The first few weeks of the course involved programming such tasks as finding elements in a table, dereferencing pointers, and sorting. Whee! We learned how to edit programs on a line printer(!) using TECO(!!). Eventually someone took pity on me and showed me how to use vi.

In the middle of the course we started learning about a new language: Lisp. We started learning how Lisp was implemented in Macro-11. It was idiotic. Lisp apparently had no notion of ‘memory’ at all, let alone data structures like strings or tables. Everything seemed to be assembled out of long, long chains of pointers, and the Macro-11 programs we wrote spent most of their time rummaging around these pointer chains trying to find *some* data *somewhere*. We wrote routines in Macro-11 that did MEMQ and ASSQ and GETPROP (for some weird reason Lisp had these ‘symbols’ that had ‘properties’, but it sure was a chore to get at the properties).

Furthermore, it seemed that Lisp encouraged the use of really inefficient coding practices. It was often the case that our subroutines would call other subroutines or even call themselves. I had no conceptual problem with a subroutine calling itself (except that it was a blatant violation of the all important ‘Un-Cuteness Principle’), but every function call ate up a chunk of stack, and that put a limit on how many function calls you could do.

It was obvious to me that Lisp was invented by some bozo that clearly didn't understand how computers actually worked. Why would someone lookup a value in a property list when you could write a simple routine in Macro-11 that would find the value in a table? The assembly code was much faster, the table was much more compact, and you could avoid these endless chains of pointers. Assuming you were dumb enough to put your data in a property list, you really couldn't use a very big property list without worrying about the stack depth. This just wasn't an issue with a linear table in assembly code.

I got this impression of Lisp: far slower than assembly, far harder to understand, far less efficient, and far less capable. I hated it and was glad to be rid of it when I finished the course.