Friday, April 10, 2009

Some Lisp Machine minutia

Bit 59 in the Lisp Machine microcode is the halt bit. If the microcode instruction register ever has bit 59 turned on, the clocks stop and the entire processor freezes. Sprinkled throughout the microcode are sanity checks that call the ILLOP subroutine if they fail. The ILLOP subroutine tries to write a ‘crash record’ in a pre-defined location and then halts the processor. If things are running correctly, this should never happen. Well, hardly ever.

The System Diagnostic Unit (SDU) has a watchdog timer. Every now and then, the processor resets the watchdog timer in the SDU. If the watchdog timer ever times out, the SDU assumes that the Lisp processor has stopped. The SDU will blink the screen to let you know the machine crashed. If you wish, you can tell the SDU to turn the clocks back on. Since ILLOP is a microcode subroutine, it will return back to the instruction immediately following the failed sanity check. This allows you to put debugging ILLOPs in the code to act as very primitive breakpoints.

Down at the bottom of the Lisp Machine screen, there is a status bar called the ‘wholine’. Under the wholine there are three little lines that are 1 pixel high and 16 pixels long. These are the ‘run lights’. The rightmost one is on unless the processor is in an input wait state, the middle one comes on when the processor is waiting for the disk, and the leftmost one comes on when garbage collection is occurring. The GC runlight has two parts, one indicates scavenging, the other indicates transporting.

When the machine is working, the runlights flicker on and off. When it crashes, everything freezes for a second or two. There is a tense pause while you wait for some sign of life from the runlights or for the SDU to blink the screen.


In my previous post I had decided to turn on the garbage collector so I wouldn't run out of memory when trying to trace the microcode. Once I had the GC turned on, I tried to run my code. The machine froze and the screen blinked.

I got very, very familiar with this.

An LMI Lambda 2x2 had two Lisp Machine processor on a single NuBus. If a processor halted, it was possible to read the processor state from other devices on the bus. When you were debugging the microcode, the best way was to have a 2x2 so you could decode the frozen processor state from the other processor and match up machine addresses with symbols, etc.

The Lisp Machine had a hardware ring buffer that recorded the value of the microcode program address register on every cycle. You could get a backtrace of the last thousand instructions that lead to the crash. This is an incredibly cool and useful feature.

When I ran my microcode tracer and crashed the machine, the microcode trace revealed that we had encountered an illegal stack environment while attempting to transport a lexical environment from within the garbage collector loop. RG sent me to talk to Ken Sinclair. The Symbolics 3600 had come out a couple of years before and had the first commercial generational garbage collectors. LMI was trying to play catch-up and Ken Sinclair was in the process of writing a generational GC for the LMI Lambda. Ken had actually completed the garbage collector, but the rest of the company was slow in adopting it.

There was an unforseen consequence of the practice of running the machine with the garbage collector turned off. The microcode had gotten sloppy and people were playing a bit fast and loose with the memory model. Untagged addresses would be squirreled away and dereferenced later with no thought given to whether the GC might move the object referred to. Bogus objects — properly tagged words with invalid addresses that pointed at uninitialized memory or into the middle of object of a different type — which would cause the GC to corrupt memory would be left in registers or on the stack. These sort of problems were everywhere in the microcode. Of course it was the garbage collector that discovered these, and following the ancient tradition, the messenger was blamed for the message.

The stack allocated lexical closure microcode was written in haste by RMS, and repented in liesure by yours truly. With Ken Sinclair's help from the GC side, and a lot of poring over microcode, I managed to fix the stack closure bug.

I don't imagine you really want the gory details, but I'll give them anyway. Here are the comments at the beginning of uc-stack-closure.lisp.

Here is how lexical closures work 
A lexical closure consists of a piece of code and an environment to 
resolve free variables in.  The code is implemented by a FEF and the 
environment is implemented as a list of vectors.  Each car in the 
environment is a lexical environment frame which is a vector of 
variable bindings. As there is no way to extend the number of variables 
in a lexical environment, a vector saves space, but more 
importantly, since the lexically visible variables can be known at 
compile time, the compiler can generate code that simply "arefs" a 
given frame and finds the correct value of a lexical variable.  A 
lexical environment is a list of these vectors.  

A closure (any kind, stack or regular) is implemented like this: 
  (dtp-{stack-}closure         ) 
                             | 
                /------------/ pointer to closure 
               V 
  (cdr-next dtp-fef-pointer  --)---> points to a fef containing the code 
  (cdr-nil  dtp-list           ) 
                             | 
                /------------/ pointer to lexical environment 
               V 
  (cdr-normal dtp-list       --)---> points to lexical frame 
  (cdr-error  dtp-list       --)---> points to next environment  

This would be too easy if it weren't for efficiency hacks.  The 
following things are noted: 

1) Some closures are only used in downward funargs and will never be 
   needed after this dynamic frame is exited.  The extent of these 
   closures is dynamic and follows stack discipline, so they can be 
   consed on the stack giving the garbage collector a break. 
   (Actually, this was done so that you wouldn't have to use a 
   losing garbage collector.)  It is not possible to tell if a 
   funarg is to be used in a downward only direction if it is passed 
   to a procedure external to the one currently running. 

2) The uppermost lexical frame of the closure when it is created is 
   the current dynamic frame.  The args and locals of the frame are 
   the ones that are seen by the code.  They cannot be copied out of 
   the frame. 

3) Not all the args and locals of a dynamic frame need appear in a 
   lexical frame.  Which args and locals are needed can be 
   determined at compile time.  This will save space and allow the 
   garbage collector to reclaim inaccessable objects that would be 
   otherwise saved if we kept the whole dynamic environment. 

4) Nested lexical contexts can be "flattened" if the code that 
   creates them is only run once.  See page 87 - 89 (base 10) of the 
   common lisp book for examples of flattenable and unflattenable 
   contexts.  A corollary to this is the fact that a binding which 
   is lexically visible to different closures and which should be 
   distinct in each closure can be shared among them if the code 
   never mutates the variable. 

The above is taken advantage of by the below. 
Efficiency hacks: 
1) We take an optimistic approach and assume all funargs are 
   downward.  Lexical frames and closures are initially allocated on 
   the stack.  All pointers made to the closure are labeled 
   dtp-stack-closure.  If a dtp-stack-closure is ever moved anywhere 
   but to a frame closer to the top of the stack, it becomes 
   necessary to copy the closure and the lexical frame 
   out of the stack and into the heap.  All closures that 
   are in the heap are labeled dtp-closure. 

2) The lexical frame when it is created contains 
   external-value-cell-pointers (EVCP's) to the actual locations in 
   the stack of the args and locals.  This makes it possible to 
   smash the args and locals from downward funargs. 

3) The lexical frame is created only with those bindings needed by 
   the closure.  This is determined by looking at the FEF of the 
   current procedure (not the one you are closing!).  Two slots 
   before the unboxed instructions is a list which is a template for 
   making lexical frames.  It is a list of fixnums in which the low 
   twelve bits specifies which argument or local appears and the 
   sign bit indicates whether it is an arg or local.  This list is 
   arranged in "reverse" order, i.e. the first argument is the last 
   on the list and the last local is the first element of the list. 
   The list is stored in reverse order because the microcode just 
   happens to make a pointer to the box just after the lexical 
   frame.  It then constructs the lexical frame by decrementing the 
   pointer and cdring down the map in the FEF. 

4) The contexts are in fact flattened by the compiler.  The compiler 
   makes sure variable references go to the right slot, so there are 
   no name conflicts.  In order to take advantage of sharing, we 
   assume that all lexical frames closed in the current dynamic 
   frame can be shared and only cons up one initially.  The compiler 
   issues commands STACK-CLOSURE-DISCONNECT to force a 
   splitting of shared frames.  This copies the current frame into 
   the heap.  Two frames which were EQ are now EQUAL (i.e. identical 
   copies instead of being identical).  Then, the compiler does a 
   STACK-CLOSURE-UNSHARE giving it an argument which specifies which 
   lexical slot to unshare.  Remember that the lexical frame 
   initially contains EVCP's to the args and locals. 
   STACK-CLOSURE-UNSHARE "snaps" the invisible pointer and copies 
   the arg into the lexical frame.  The frame will still share the 
   other args and locals by virtue of the remaining EVCP'S 

   When it finally comes time to exit the stack frame, if there are 
   any outstanding closures in the heap pointing to a lexical frame 
   which is stack consed in the current stack frame, we copy the 
   stack frame to the heap and snap all the EVCP's in it.  We then 
   go to each closure pointing sharing any of the args or locals and 
   make their EVCP's point to the copy we just constructed.  Now we 
   can exit the frame.  Note that in order to find each closure in 
   the heap, we keep around a list of all closures disconnected from 
   this frame.  

How a frame with closures is set up in the first place:  

Frame begins here  (dtp-fix) ;bits controlling return 
                  (dtp-fix) 
                (DTP-FEF-POINTER  --)---> to code for the frame. 
Arguments        (cdr-next ......) 
 cdr codes are   (cdr-next ......) 
 set right               : <more args> 
                         : 
 last arg        (cdr-nil  ......) 
Locals           (...............) <--- A-LOCALP if this is the current function 
 random boxed    (...............) 
 objects                 : <more locals> 
                         : 
   Stack closures are allocated in the area for locals and take up 
   four local slots.  They are not really locals, they just live here. 
   The <pointer to next cell> is the pointer to the lexical 
   environment chain which just happens to be in the next cell. 
stack closure   (cdr-next   dtp-fef-pointer --)---> points to closure code 
                (cdr-nil    dtp-list  <points to next cell>) 
                (cdr-normal dtp-list  <points to lexical frame>) 
                (cdr-error  dtp-list  <points to next higher context>) 
                        :  <more stack closures> 
                        : 
   The lexical frame also takes up locals.  It is constructed from the 
   map found two slots before the instructions in the FEF.  We work 
   here on the assumption that we will not need more than one lexical 
   frame (in the case where all variables are or can be shared). 
                (dtp-list <pointer to beginning of frame>) 
lexical frame   (cdr-next dtp-external-value-cell-pointer <pointer to local or arg>) 
                (cdr-next dtp-external-value-cell-pointer <pointer to local or arg>) 
                        : <more lexical slots> 
                        : 
next-to-last-local      (dtp-list <pointer to lexical frame>) 
last-local       Contains a list of all copies made of the lexical 
                 frame so we can set them up right when we exit this 
                 dynamic frame and deallocate storage for the 
                 variables. 

The top of the stack is here.  

Notes on the above diagram. 

1) The word just before the lexical frame is used by 
   COPY-STACK-ENVIRONMENT-INTO-HEAP to locate the beginning of the stack 
   frame. 

2) The lexical frame need not be there.  All the local slots 
   are set to point to nil when the frame is entered.  When it comes 
   time to make a stack closure, the next to last local slot is 
   checked to see if a lexical frame has been made.  If it has not, 
   (i.e. it is nil) the FEF is looked at to find a list of args and 
   locals to forward the slots in the lexical frame to.  If the list 
   in the FEF is nil, this means that the current frame is empty. 
   In this case, the next to last local is set to T indicating an 
   empty frame. 

3) In the comments above I used the words "lexical frame" to 
   indicate what it is that holds the bindings in an environment. 
   In the below code, it is called a "lexical frame"  I 
   apologize to those of you who tried to figure out what was going 
   on by examining my comments vs. the code before reading this. 
   "Lexical frame" is the use, "lexical frame" is the 
   implementation. 

4) A-LEXICAL-ENVIRONMENT will point to the context outside of this 
   frame so all closures created in this frame will contain a copy 
   of A-LEXICAL-ENVIRONMENT.   

When we copy a stack closure into the heap, we forward the copy on 
the stack.  We can't use HEADER-FORWARD because putting that in a 
structure will confuse other things.  What we do is put 
external-value-cell-forwards in the fef pointer and the environment 
pointer in the stack closure to point to the corresponding cells in 
the heap pointer.  This does not really forward the closure, but it 
will do the trick because the stack closure never moves until it is 
deallocated and anything that remains around after that sees only 
the heap allocated version.  The lexical frame (the stack closure 
vector) is not forwarded.  

The last word in the frame is a pointer to every copy of the lexical 
frame in the world.  If the stack frame is exited, we copy the 
lexical frame and go to each of the other copies and change their 
EVCP's to point to our new copy.  

This is the old comment. 
; Copy a DTP-STACK-CLOSURE when it is stored anywhere but 
; into the same stack it points at, and farther down than where it points. 
; There is no way to forward the stack closure to the copy, 
; because only header-forward works to forward a list's car and cdr, 
; and putting that inside a structure will confuse other things. 
; So we stick an external-value-cell-pointer into the stack closure 
; pointing at the copy.  This does not forward it as far as the 
; low levels of the system is concerned!  But as long as the 
; stack closure still exists, that's ok; the evcp forwards only the car 
; of the stack closure, but forwards it to the car of the copy, 
; which contains the correct value. 
Wow. I vaguely remember writing that.

5 comments:

Paul Steckler said...

So, do you have a working LMI box?

-- Paul

Joe Marshall said...

No, but I'm not sure where I'd put it if I did. It was about the size of a fridge.

Base said...

Disclosure: I totally have no idea about Lisp, and am just trying to learn. Please excuse my ignorance...

Obviously hardware has come a long way since these machines roamed the earth. Did they have any specific advantages over hardware that would still apply today? Is there a reason for someone to create a new Lisp machine using todays hardware that would somehow totally kick ass?

Thanks much in advance!

Joe Marshall said...

The only reason to have hardware is in order to run software. Compared to x86, the amount of Lisp software is minuscule. There wouldn't be a market for Lisp machine hardware.

Base said...

I kinda figured that but wasn't sure if there was some additional instruction set or something that would make Lisp run more effectively on machines somehow optimized for Lisp.

I am a very much a newbie to Lisp and a self taught programmer. I have been reading a lot about Lisp and that, even if you do not use it as a language that you program in, learning it just makes you a better programmer, regardless of the language(s) that you use. So I am making an effort to learn Lisp in order to formalize learning. I have to say that it has really been exciting to gain this insight and I am really committed to continuing this process. Postings like this really help people trying to learn, so thank you so much for your time. It is truly appreciated!