Friday, May 15, 2009

Oldspace and the Transporter

When the garbage collector decides to free a region, it marks the region as being ‘oldspace’. Live objects will be copied out of oldspace into newspace. Once all the live object have been relocated, the oldspace regions are returned to the memory system. (Since there is no longer anything of interest in oldspace, the virtual memory associated with it is unmapped and any physical pages can be freed.)

A key component of the garbage collector is the transporter. The transporter is responsible for copying objects from oldspace to newspace. This is a fairly straightforward operation, but there are a few complexities. First, the transporter must copy the entire object to newspace, and it must copy only that object. If it doesn't copy the entire object, then information will be lost and it is likely that newspace will be corrupted. If it copies more than it should, it may cause a partial copy of the next object in oldspace to end up in newspace. Again, this will likely corrupt memory. The constraint of copying all the object and only the object is an obvious one, but many GC bugs can be tracked down to a failure in this constraint. An incorrect header can cause the transporter to miscalculate the size of the object. An unboxed word might be mistakenly interpreted as a pointer and cause random words to be copied.

Once the transporter has copied an object to newspace, it writes a forwarding pointer in oldspace that contains the address of the new object. This preserves the sharing in the graph structure of the heap.

The transporter is made more complicated by the presence of ‘locatives’. A locative is a first-class pointer to a storage location. To give an example, I could create a locative that points to the third element of a vector. Reading the locative would return the same value as reading the third element of the vector. Writing the locative would modify the third element of the vector. In the Lisp machine, a locative is simply a pointer with a special tag. Most tagged pointers contain the address of the first word of storage allocated for an object. A locative is allowed to point anywhere within the storage allocated for an object.

To support locatives, you must be able to take an arbitrary heap address and discover the first word of the object that contains that address. There are different techniques for this. Here's how the LMI Lambda did it. You can always discover the object boundaries in a region by starting at the very beginning of the region (which must be the start of an object) and scanning forward object by object until you find the containing object. Of course this is terribly slow, not only because you are making a linear scan, but because you may have to fetch the relevant pages from the disk in order to walk the objects. So the Lambda kept track of the address of the first object on each page and stored this value in the ‘structure handles’. To find the object that contains an address, you find the page the object is on, locate the first object on that page, and then scan forward. When you relocate a locative, you have to displace it to the middle of the relocated object.

Another thing that complicates the transporter is forwarding pointers. There are various kinds of forwarding pointers used in the Lisp machine. Some forwarding pointers are handled exactly like locatives. The transported doesn't have to do anything special (beyond the usual locative relocation). Other forwarding pointers might be ‘snapped’ by the transporter. A good example is the DTP-ONE-Q-FORWARD type. This is special pointer that allows you to deliberately alias memory locations. When the transporter encounters one of these, it dereferences the forwarding address to find the actual storage and transports that object instead. The forwarding pointer is no longer necessary after the GC.

The transporter also has some special code to handle CDR-coded lists. CDR-coding allows you to compress list structure by eliding CDR pointers in the common case of a linear list. It is important for the transporter to maintain the CDR-coding or the list would double in size when it was transported. I was surprised to find that the transporter did not perform cdr-coded compression if it discovered a uncompressed linear list. The rationale was that would only happen rarely, and the amount of hair it would add to the code was significant (consider if somewhere else in memory someone had taken a locative to the CDR of a list you were about to encode away).

The Lambda had a ‘read barrier’ that enabled it to do incremental garbage collection in parallel with user processes. Henry Baker came up with the idea that allows this to work. The machine maintains the illusion that there is no oldspace by automatically transporting objects to newspace when references to them are read from memory (the read barrier acts as a barrier to prevent oldspace pointers from ever being in registers). Since no user code could manipulate an object without bringing a reference to it into a register, the user process was guaranteed to never see an oldspace pointer. The user process goes about its business completely unaware that a chunk of memory is in the process of being reclaimed.

In theory, the use of a read-barrier would make it possible to guarantee a real-time response during garbage collection, but the theory depends on the transporter always taking a small bounded amount of time. In actuality, the amount of time transporter takes depends upon the size of the object being transported. Arrays naturally take time proportional to their size. Furthermore, it may be necessary to take page faults when reading the object out of oldspace and writing in to new space. In the worst case, the current region in newspace is full and the allocator will have to find and initialize a new region. So although the Lambda was generally able to run during garbage collection, it didn't have a hard real-time guarantee.

The read barrier had a clever hardware assist. The Lambda had a dispatch microinstruction that performed a computed jump through a dispatch table. You often see this sequence of instructions in the microcode:
((vma-start-read) add m-fef a-1) ;Read the location.
(check-page-read)
(dispatch transport md)
The check-page-read instruction is a conditional call to the page fault handler. The vma register holds the virtual address, and when the read finishes, the md register (for ‘memory data’) holds the contents. The (dispatch transport md) instruction is the one that checks the read barrier. The Lambda barrel shifter is used to rotate the tag bits into place for the dispatch, and the lowest bit of the dispatch table address is taken from the memory maps. The dispatch table itself has entries that specify whether to call the transporter subroutine or to simply drop through.

This sounds more complicated than it is. Basically, once you've read a memory location, you either leave it alone or you call the transporter. You only call the transporter if it is both a pointer, and the address is in a block of oldspace.

Conventional wisdom has it that a read barrier implemented in software would be too expensive. Many garbage collected systems eschew the read barrier and simply pause the user process while the garbage collector runs. With generational scavenging, the pauses can be amazingly small, so it hardly seems worth implementing a read barrier. On the other hand, processors these days are so fast that even reading a memory location takes a noticeable amount of time. An oldspace check on read might not add significant overhead, especially if it were cleverly coded.

The read barrier on the LMI Lambda was only partially implemented in hardware. The hardware used the contents of the MD register to address the first-level memory map in order to read the oldspace bit. The microcode still had to issue a dispatch instruction (which included the necessary shift and mask). Modern processors run so much quicker these days that they can emulate the Lambda hardware faster than the original hardware itself.

3 comments:

  1. That was by far the driest thing I've ever read.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Excellent info. It's a shame we'll probably never see them again, but the design and implementation of the Lispm remain fascinating to me.

    ReplyDelete