Saturday, March 21, 2009

On the airplane heading east for ILC 09. I like having traveled, but I don't like the travel itself. I'm looking forward to ILC, but not to my presentation on Monday. I really dislike public speaking, and every time I do it I swear that it will be the last time. I figure, though, that at least a few people will be interested in what I'm saying and I can answer some questions. Fortunately, Gerry Sussman and Alexy Radul are presenting a short version of their propagator talk right after me. I can probably donate some of my time to them.

So back to LMI and my microcode tracer.

I quickly discovered some problems. The first was an issue with lexical scoping. The Lisp machine compiler would ‘flatten’ the lexical environment in a procedure so a variable could be located by a simple vector ref. But I found that when I had two separate variables with the same name, they would end up getting confused with each other. I wrote some demo code that exhibited the bug.

(defvar *loser-1*)
(defvar *loser-2*)

(defun foo-loser (a b)
  (let ((a 'local-a))
    (setq *loser-1*
          (lambda () (format t "~%A = ~S, B = ~S" a b))))
  (let ((b 'local-b))
    (setq *loser-2*
          (lambda () (format t "~%A = ~S, B = ~S" a b)))))
The foo-loser program has two arguments named a and b, and two locals also named a and b. The global variables *loser-1* and *loser-2* are each set to a thunk that prints the value of a and b that the thunk has closed over. *loser-1* ought to be closed over the local variable a and the argument b, while *loser-2* ought to be closed over argument a and local variable b. The expected behavior is this:
> (foo-loser 'argument-a 'argument-b)

> (funcall *loser-1*) 

> (funcall *loser-2*)
The actual behavior was this:
> (funcall *loser-1*) 

> (funcall *loser-2*)
The local variables bound by the LET expressions are visible to both thunks.

I showed this bug to RG and he explained that lexical scoping had been a relatively recent addition to the compiler and that few programs made enough use of it for the bug to have surfaced. He also suggested that the fastest, surest way to get it fixed was look at the compiler and fix it myself.

I had a theory about the source of the problem. There was a table that mapped the variable name to its position in the environment. Clearly when the local variables were added to the table, it caused ambiguity when looking up the names. I was guessing that the table was implemented by an association list. This would be a simple, obvious mechanism and later additions to the table would shadow the earlier ones resulting in the observed bug. So looked in QCP1.LISP to find out what was going on.

This text appears at the beginning of the Lisp machine compiler:
;; "This is insane. What we clearly want to do is not completely
;; clear, and is rooted in NCOMPLR." -- BSG/Dissociated Press.
I was Dante on the threshold of Dis.

The Lisp machine compiler is not re-entrant. If an internal function is encountered while compiling a top-level function, it is put on a queue to be compiled later. These are called breakoff functions. With dynamic scoping, there is no need to keep compiler state from the lexically enclosing function when compiling a breakoff function, but with lexical scoping, the breakoff function has to know where the compiler put the variables in the enclosing function. But the compiler doesn't decide this until well after the breakoff has been enqueued. So when the compiler finally gets around to deciding where the variables are going to be, it looks in the compiler-queue to see if any of the breakoff functions need to know the locations. But this is where the bug comes up.

The compiler has a data structures that represents the variables. The lexical environment is simply a collection of these variables. For the example code, the compiler knows about these variables:
[name: a] ;; argument
[name: b] ;; argument
[name: a] ;; local
[name: b] ;; local
In the example above, the first breakoff function is in this SETQ:
(setq *loser-1*
      (lambda () (format t "~%A = ~S, B = ~S" a b)))
When we get to the point of compiling the reference to variables A and B, we no longer know which A and B we want.

The problem is obvious, but coming up with a fix that doesn't involve a serious amount of rewriting of the compiler is not obvious. The strategy I settled on was a bit nasty. When the breakoff function is enqueued, I record a list of which variables are visible. The positions of the variables in the stack frame are not yet known, but which variables shadow which other variables is known. This information gets stored along with the breakout function. Later on, when we find the positions of the variables and walk the compiler-queue to tell the pending breakoffs about them, we do a little trick. Since we know the actual variable records we ought to be able to see, we rename the variables we shouldn't see to a unique name. (We can't just remove the non-visible ones because the location in the list defines the offset in the frame.)
[name: a] --> [name: #G:123] ;; argument (shadowed)
[name: b] [name: b] ;; argument (visible)
[name: a] [name: a] ;; local (visible)
[name: b] --> [name: #G:124] ;; not in scope (not visible)
Now when we can look up the variables by name without finding the wrong one.

This trick works, but it isn't pretty. I was not very happy with this solution, but the alternatives seemed to require quite a bit more understanding of the existing code. I came back to RG with the fix. To my surprise he approved of the change and told me to check it in.

The first problem was solved, but more bugs soon appeared...