Thursday, September 22, 2011

More on GC

I had been thinking of improving the garbage collector in MIT/GNU Scheme. I carefully measured the current GC performance so I could quantify any improvements. This uncovered a bug (see earlier post), but it also uncovered something more interesting. Here is a plot of the number of garbage collections as a function of heap size for the problem of compiling all the Scheme code in MIT/GNU Scheme:
This graph looks an awful lot like the graph of minimal collections. In fact, if we plot the minimum number of garbage collections necessary for about 6100000 blocks, we get this:
That appears to be a really good fit. This isn't too surprising, because a GC that collects only when it is almost out of memory ought not to be doing much more work than the minimum necessary.

What makes this interesting is this: We can clearly see the “knee” of the curve, and we have enough memory to be “well beyond the knee” when we compile all of Scheme. This means that the total number of garbage collections will be quite low. If the garbage collector is fast enough, the total GC time will also be quite low. For example, when given 40000 blocks of memory, MIT/GNU Scheme will compile itself in about 3 minutes of which only 2.6 seconds total are spent in garbage collection. And as I noted previously, with sufficient memory, there need not be any garbage collection at all.

Notice that the details of the collector are unimportant. MIT/GNU Scheme uses a variation of the Minsky-Fenichel-Yochelson-Cheney-Arnborg algorithm, but it could just as easily use a mark-sweep collector. The point is that when you are only doing a few dozen collections, it doesn't matter much what algorithm you use because changing the algorithm will make little impact on the total run time. (Suppose I had the world's greatest garbage collector that just blows away the current one. Then I could reduce the total time from 182.6 seconds to exactly 180 seconds. This is less than a 5% improvement and not worth pursuing.) (A reference count GC, however, would be a problem because it traces garbage rather than live storage.)

So my original plan is pointless. There is no advantage to improving the garbage collector if it doesn't take more than a small amount of time, and it looks like you can achieve this by simply throwing memory at the process. So long as you have sufficient memory, you can make the cost of a collection be close to (or even equal to) zero.

At least it looks that way...

4 comments:

JBF said...

I realize this is going to be a long comment touching lots of subjects, but I'll give it a try.

- I've heard claims that the the "current" Java benchmark everyone optimizes for has a GC overhead of around 2%. While I can't back the claim at all it seems to fit with your numbers. This was btw with a parallel (1 thread per hw thread) implementation of non-concurrent mark-and-sweep.

- So far you haven't touched what I would argue is the big issue with garbage collections, that is pause times.

Basically all collectors today are either stop-and-copy or mark-and-sweep or a combination (stop-and-copy for young gen, mark-and-sweep for old gen seems to be a local maximum).

Stop and copy is good because work is O(live objects) _and_ it gives you compaction for free, but usually you waste 50% of your heap and you can't run it in parallel with your mutators.

Mark and sweep is good because you can chop work up in smaller pieces and much work can be done with mutators running. Work is however O(live set) for mark + O(heap size) for sweep, and compaction is a pain to implement with bounded pause times.

Space overhead is a lot lower than 50%, but you need "mark bits" in the object headers (or in separate bitsets for better cache behavior) and you often use cards for write barriers.

What we haven't seen yet is a GC that handles large heaps (100+ G), has bounded pause-times (hundreds of milliseconds) and has a reasonable overhead in time (single digits overhead?) (and space).

Jason Wilson said...

Of course with an unlimited memory budget you can make the GC time irrelevant but that isn't the only cost of "garbage". You'd need to quantify the costs of not doing compaction (more cache misses and more TLB misses - maybe - sometimes the GC rearranges objects in a less beneficial order than the original allocation order).

Since no one wants to "waste" memory, typically people use less heap than they really should. For example, a production process I was working on was allocated only 512Mb of Java memory (heap and whatever else Java allocates memory for). I complained that my smart phone had that much memory and people agreed to raise it all the way up to 1Gb. I don't have the time spent in GC off-hand to share.

Joe Marshall said...

JBF said So far you haven't touched what I would argue is the big issue with garbage collections, that is pause times.
I don't have enough to say about pause times except to note that if you don't GC at all, you don't pause, either.


Jason Wilson said that isn't the only cost of "garbage". You'd need to quantify the costs of not doing compaction.

That seems difficult. Before I add a cost metric to the model, I want to see if the crude model has any utility.

Keith Rarick said...

"So long as you have sufficient memory, you can make the cost of a collection be close to (or even equal to) zero."

At least you can make the cost of all collections amortized over the total running time arbitrarily small. Appel argues exactly that point in http://www.cs.princeton.edu/~appel/papers/45.ps