Tuesday, August 30, 2011

Measuring GC times

MIT/GNU Scheme has a command line parameter, --heap, that allows one to specify the size of heap in 1024 word blocks. On a 32-bit machine, the heap size is limited because MIT/GNU Scheme stores the type tag in the upper bits of the word. On a 64-bit machine the word size is wider, so it is possible to specify extremely large heaps without worrying that the address will overflow into the type tag.

For kicks, I decided to vary the value of the heap size and see what effect that had on the task of MIT/GNU Scheme compiling itself.
HeapGC CountCumulative GC Time
in ms
8192 (64 MiB)764 19120
10000623 17910
16384 (128 MiB)377 10980
32768 (256 MiB)188 7350
50000124 5560
65536 (512 MiB)94 5000
10000063 4410
131072 (1 GiB)48 3910
15000042 4160
20000032 3800
25000026 3560
262144 (2 GiB)25 3050
30000022 3360
35000019 3260
393216 (3 GiB)18 2740

Collecting this data was tedious, so I spent a fair amount of time thinking about a model that could explain the results. At first glance, everything looks about right: the more memory, the fewer GCs and the less total time is spent garbage collecting. The problem is that the curve predicted by my model doesn't fit the empirical data at all. Of course this means that my model is either wrong or incomplete.

I have since figured out the problem, but I thought I'd let people puzzle over it themselves for a bit, first.


JBL said...

Looks like the time spent per GC cycle increases with the heap size. So, while GC is less frequent, the amount of memory that needs to be freed or the number of objects/elements that need to be freed increases with heap size.

John Cowan said...

Chicken Scheme used to use a strategy like this to compute the best size for the nursery at configuration time. The curve was typically bimodal: slow with both small and large nurseries, best somewhere in the middle. Eventually, Felix gave up on this and decided to use a fixed nursery size of 64K for all configurations.

The main heap is split into two semispaces which can be as large as you want, because Chicken uses low-end tagging: the low order bit is 1 for a fixnum, and the two low order bits are 10 for a character, #t, #f, (), the end-of-file object, the undefined-value object, and the unspecified object. That leaves all pointers with 00 in the low end.

Chris said...

So what's the model? I can't think about this without that information.

Jrm said...

Chris asked: So what's the model?

MIT/GNU Scheme uses a variant of the Minsky - Fenichel - Yochelson algorithm. As in Fenichel and Yochelson, when the heap is exhausted the garbage collector stops the world and copies the non-garbage, but rather than divide the address space into semi-spaces, the collector saves the non-garbage to ‘off-line’ storage. In this case the off-line storage is not a magnetic drum, but another region of virtual memory. Once oldspace is evacuated, the off-line storage is copied back into the heap.

One of the main benefits of the MFY algorithm is that the time (and storage) needed to collect is proportional to the amount of live storage, not the amount of garbage or the total size of the heap. JBL noted from my numbers that “the time spent per GC cycle increases with the heap size.” I had assumed that the GC time could be bounded by a (large) constant that was independent of heap size. This is clearly not the case and deserves further investigation...

Luís said...

One possible factor is locality. With a larger heap, perhaps the working set is fragmented enough to slow tracing down? Just a wild guess.