Comments on Abstract Heresies: More on GC

"So long as you have sufficient memory, you c...

2011-09-25T17:53:31.664-07:00

"So long as you have sufficient memory, you can make the cost of a collection be close to (or even equal to) zero."

At least you can make the cost of all collections amortized over the total running time arbitrarily small. Appel argues exactly that point in http://www.cs.princeton.edu/~appel/papers/45.ps

JBF said So far you haven't touched what I wou...

2011-09-23T07:51:19.933-07:00

JBF said So far you haven't touched what I would argue is the big issue with garbage collections, that is pause times.
I don't have enough to say about pause times except to note that if you don't GC at all, you don't pause, either.

Jason Wilson said that isn't the only cost of "garbage". You'd need to quantify the costs of not doing compaction.

That seems difficult. Before I add a cost metric to the model, I want to see if the crude model has any utility.

Of course with an unlimited memory budget you can ...

2011-09-22T16:57:12.044-07:00

Of course with an unlimited memory budget you can make the GC time irrelevant but that isn't the only cost of "garbage". You'd need to quantify the costs of not doing compaction (more cache misses and more TLB misses - maybe - sometimes the GC rearranges objects in a less beneficial order than the original allocation order).

Since no one wants to "waste" memory, typically people use less heap than they really should. For example, a production process I was working on was allocated only 512Mb of Java memory (heap and whatever else Java allocates memory for). I complained that my smart phone had that much memory and people agreed to raise it all the way up to 1Gb. I don't have the time spent in GC off-hand to share.

I realize this is going to be a long comment touch...

2011-09-22T13:53:08.693-07:00

I realize this is going to be a long comment touching lots of subjects, but I'll give it a try.

- I've heard claims that the the "current" Java benchmark everyone optimizes for has a GC overhead of around 2%. While I can't back the claim at all it seems to fit with your numbers. This was btw with a parallel (1 thread per hw thread) implementation of non-concurrent mark-and-sweep.

- So far you haven't touched what I would argue is the big issue with garbage collections, that is pause times.

Basically all collectors today are either stop-and-copy or mark-and-sweep or a combination (stop-and-copy for young gen, mark-and-sweep for old gen seems to be a local maximum).

Stop and copy is good because work is O(live objects) _and_ it gives you compaction for free, but usually you waste 50% of your heap and you can't run it in parallel with your mutators.

Mark and sweep is good because you can chop work up in smaller pieces and much work can be done with mutators running. Work is however O(live set) for mark + O(heap size) for sweep, and compaction is a pain to implement with bounded pause times.

Space overhead is a lot lower than 50%, but you need "mark bits" in the object headers (or in separate bitsets for better cache behavior) and you often use cards for write barriers.

What we haven't seen yet is a GC that handles large heaps (100+ G), has bounded pause-times (hundreds of milliseconds) and has a reasonable overhead in time (single digits overhead?) (and space).