To understand the Lambda garbage collector, you have to know a bit about the memory system. Memory in the Lambda is contained in ‘regions’. A region is a contiguous chunk of virtual address space that is aligned on a 64-page block boundary and contains an integral number of 64-page blocks. The reason for the alignment and size restrictions is because the memory mapping hardware uses the top 12 bits of the virtual address (bits [24:13] inclusive) as an index into the level one memory map. There are usually several regions in use at any time.
Every region has a free-pointer. Addresses within the region, but below the free-pointer are in use, addresses within the region but equal to or greater than the free-pointer are not in use.
Now let me digress a little. Assuming you're not using one of those broken reference-count hacks that don't actually work, the standard garbage collection model you see everywhere is this:
- Garbage collection is triggered when you run out of memory.
- The garbage collector starts with a root set of objects, then
either
- recursively walks the tree of objects marking the ones in use.
- scans the root set and relocates objects in use to a new place in memory.
- Step 2 continues until the transitive closure of objects is marked or moved.
- The collector frees the memory that is no longer in use (the coset of the memory in use).
- Some or all of the freed memory is now available for allocation. In particular, the allocation that triggered the garbage collection can now proceed.
- The fundamental unit of allocation and deallocation for the memory system is the region.
- To allocate an object within a region, you check the free
pointer to determine if there is sufficient remaining room.
- If there is room, you simply move the free pointer and initialize the storage.
- If the region is full, you look for a different appropriate region. (Some regions are reserved for different uses.) If you find one, allocate there.
- If there are no appropriate regions with room, you request the creation a new region from the memory system.
- If this succeeds, you begin allocation within this new region.
- If this fails, then you are out of memory and the machine halts. (An error is signalled before you are absolutely out of memory, so you won't be suddenly surprised, but if you continue to ignore the error and not release any storage at all, the machine will eventually halt because it is unable to fulfill your request.)
- The garbage collector works by freeing entire regions of
memory. The basic plan is this:
- select one or more of the allocated regions and mark them as ‘condemned’. No new objects can be allocated there.
- Evacuate (relocate) the live objects from the condemned regions.
- When there are no more live objects in the region, return the region to the memory system.
The garbage collector runs as a separate process on par with the other processes. Its job is to free memory and return it to the operating system. It monitors memory usage and initiates a garbage collection cycle when it determines that memory needs to be freed. The decision when to initiate garbage collection is based on the current rate of allocation, the estimated rate of recovery, the estimated machine resources used during collection, and user parameters.
In other words, allocation is decoupled from garbage collection. The decision of when (or even if) to garbage collect is not made by the allocator. You can turn the garbage collector off at any time, or turn it back on at any time. Of course you run the risk of running out of memory if you leave it off for too long.
Next: Oldspace and the transporter
3 comments:
So you had a hardware read barrier that worked at the object level?
Isn't that very expensive to reproduce in software?
Aren't the atomicity issues very tricky when you deal with true multiprocessors as opposed to PCLSRed time-sharing?
Yes, there was a hardware read barrier. I'm not sure what you mean by `object' level, though. I'll try to explain in more detail in the next post and maybe that will answer your question.
Conventional wisdom is that a read barrier is expensive in software, but I'm not sure if it is so expensive to be prohibitive. I'd like to see some measurements. I recall a paper that managed to avoid a read barrier by modifying the write barrier, but I think there was a problem that it made EQ be more complex than a simple pointer comparison.
I'm sure the atomicity issues for a true multiprocesser are quite tricky. Of course a read barrier is properly part of the memory system abstraction, not the process abstraction, so it would probably be easier to reason about it at that level. I think Azul has a read barrier on their multiprocessor Java hardware. They may have some papers on what they did.
Azul does, and they say moving it to x86_64 results in a 20% (as I recall) performance penalty (of course, that would be with Java code). Look into the currently moribund (but there's source) Managed Runtime Initiative, their papers on their garbage collectors and version N-1 (last time I checked) is described in the new edition of the garbage collector book.
Post a Comment