Abstract Heresies: Measuring GC times

Tuesday, August 30, 2011

Measuring GC times

MIT/GNU Scheme has a command line parameter, --heap, that allows one to specify the size of heap in 1024 word blocks. On a 32-bit machine, the heap size is limited because MIT/GNU Scheme stores the type tag in the upper bits of the word. On a 64-bit machine the word size is wider, so it is possible to specify extremely large heaps without worrying that the address will overflow into the type tag.

For kicks, I decided to vary the value of the heap size and see what effect that had on the task of MIT/GNU Scheme compiling itself.

Heap	GC Count	Cumulative GC Time in ms
6000	1056	25800
8192 (64 MiB)	764	19120
10000	623	17910
16384 (128 MiB)	377	10980
32768 (256 MiB)	188	7350
50000	124	5560
65536 (512 MiB)	94	5000
100000	63	4410
131072 (1 GiB)	48	3910
150000	42	4160
200000	32	3800
250000	26	3560
262144 (2 GiB)	25	3050
300000	22	3360
350000	19	3260
393216 (3 GiB)	18	2740

Collecting this data was tedious, so I spent a fair amount of time thinking about a model that could explain the results. At first glance, everything looks about right: the more memory, the fewer GCs and the less total time is spent garbage collecting. The problem is that the curve predicted by my model doesn't fit the empirical data at all. Of course this means that my model is either wrong or incomplete.

I have since figured out the problem, but I thought I'd let people puzzle over it themselves for a bit, first.

5 comments:

JBL said...: Looks like the time spent per GC cycle increases with the heap size. So, while GC is less frequent, the amount of memory that needs to be freed or the number of objects/elements that need to be freed increases with heap size.; August 30, 2011 at 1:57 PM
John Cowan said...: Chicken Scheme used to use a strategy like this to compute the best size for the nursery at configuration time. The curve was typically bimodal: slow with both small and large nurseries, best somewhere in the middle. Eventually, Felix gave up on this and decided to use a fixed nursery size of 64K for all configurations.

The main heap is split into two semispaces which can be as large as you want, because Chicken uses low-end tagging: the low order bit is 1 for a fixnum, and the two low order bits are 10 for a character, #t, #f, (), the end-of-file object, the undefined-value object, and the unspecified object. That leaves all pointers with 00 in the low end.; August 30, 2011 at 3:47 PM
Anonymous said...: So what's the model? I can't think about this without that information.; August 30, 2011 at 5:19 PM
Joe Marshall said...: Chris asked: So what's the model?

MIT/GNU Scheme uses a variant of the Minsky - Fenichel - Yochelson algorithm. As in Fenichel and Yochelson, when the heap is exhausted the garbage collector stops the world and copies the non-garbage, but rather than divide the address space into semi-spaces, the collector saves the non-garbage to ‘off-line’ storage. In this case the off-line storage is not a magnetic drum, but another region of virtual memory. Once oldspace is evacuated, the off-line storage is copied back into the heap.

One of the main benefits of the MFY algorithm is that the time (and storage) needed to collect is proportional to the amount of live storage, not the amount of garbage or the total size of the heap. JBL noted from my numbers that “the time spent per GC cycle increases with the heap size.” I had assumed that the GC time could be bounded by a (large) constant that was independent of heap size. This is clearly not the case and deserves further investigation...; August 31, 2011 at 7:51 AM
Luís said...: One possible factor is locality. With a larger heap, perhaps the working set is fragmented enough to slow tracing down? Just a wild guess.; September 1, 2011 at 2:15 PM