Friday, September 23, 2011

Knee surgery

As an anonymous person pointed out, the knee in the curve is an illusion. Here is the same chart displayed with varying scales for the x and y axes:
You can see that the position of the “knee” depends upon the scale of the X and Y axes.

The whole chart is a problem. Not only does it show a knee, but the rest of the chart has the data all squished up along either the X or Y axis. It is going to be hard to compare different charts if they all look like this. I want the whole chart to be scale invariant.

(Can you guess where this is going?)

The curved line in the chart represents the solution to x × y = 6100000. This line appears to come pretty close to the actual garbage collection count when you use MIT/GNU Scheme to compile itself. Since the product is a constant, we have a hyperbolic curve. To transform this into something more reasonable, we want to take the logarithm of the curve in both the X and Y direction. This will change our line from x × y = constant to x + y = constant (because logarithms transform multiplication into addition).
This chart is certainly more interesting! First, notice how the measured garbage collection counts fall in almost a straight line. The curve that approximates the count is a straight line, and you can see that the actual garbage collection count deviates from this line slightly at the ends. Second, notice that the lines 10 × x, x, and x/10 become parallel. This is convenient, too. Finally, notice that the knee is gone. This sort of chart won't fool us with a fake knee.

Now that we have something of an interesting baseline, it would be instructive to plot other processes that generate garbage and see how they look. (upcoming post)

A bit of a problem

Once again, here is a plot of the actual garbage collection counts as a function of heap size:
and we mention that we want to be “beyond the knee” in the curve. So where is that “knee”?

The green line is the plot of f(x) = 6100000/x, or, to put it another way, the plot of x × y = 6100000. The knee in the curve ought to be where x = y = √6100000 ≈ 2470. But look at the chart! Just by eyeballing it, we can see that 2470 is nowhere near what anyone would call the knee.

Thursday, September 22, 2011

More on GC

I had been thinking of improving the garbage collector in MIT/GNU Scheme. I carefully measured the current GC performance so I could quantify any improvements. This uncovered a bug (see earlier post), but it also uncovered something more interesting. Here is a plot of the number of garbage collections as a function of heap size for the problem of compiling all the Scheme code in MIT/GNU Scheme:
This graph looks an awful lot like the graph of minimal collections. In fact, if we plot the minimum number of garbage collections necessary for about 6100000 blocks, we get this:
That appears to be a really good fit. This isn't too surprising, because a GC that collects only when it is almost out of memory ought not to be doing much more work than the minimum necessary.

What makes this interesting is this: We can clearly see the “knee” of the curve, and we have enough memory to be “well beyond the knee” when we compile all of Scheme. This means that the total number of garbage collections will be quite low. If the garbage collector is fast enough, the total GC time will also be quite low. For example, when given 40000 blocks of memory, MIT/GNU Scheme will compile itself in about 3 minutes of which only 2.6 seconds total are spent in garbage collection. And as I noted previously, with sufficient memory, there need not be any garbage collection at all.

Notice that the details of the collector are unimportant. MIT/GNU Scheme uses a variation of the Minsky-Fenichel-Yochelson-Cheney-Arnborg algorithm, but it could just as easily use a mark-sweep collector. The point is that when you are only doing a few dozen collections, it doesn't matter much what algorithm you use because changing the algorithm will make little impact on the total run time. (Suppose I had the world's greatest garbage collector that just blows away the current one. Then I could reduce the total time from 182.6 seconds to exactly 180 seconds. This is less than a 5% improvement and not worth pursuing.) (A reference count GC, however, would be a problem because it traces garbage rather than live storage.)

So my original plan is pointless. There is no advantage to improving the garbage collector if it doesn't take more than a small amount of time, and it looks like you can achieve this by simply throwing memory at the process. So long as you have sufficient memory, you can make the cost of a collection be close to (or even equal to) zero.

At least it looks that way...

Monday, September 19, 2011

Thoughts about GC

Suppose I have a program that, when run, allocates a total of 10 GiB (10×230 bytes). If it happens that I have more than 10 GiB of heap, then I don't need to worry about running out. If my heap is smaller than 10 GiB, I'm going to have a problem.

It is often possible to re-use storage by overwriting the previously written information. If the result computed by a program (including the side effects as part of the ‘result’) is completely invariant with respect to the value of a location in memory, then that memory location could be re-used. Depending upon the program and the runtime system, re-use can be the responsibility of the programmer, or there can be an automatic means of storage management (e.g. a garbage collector). Regardless of how re-usable storage is identified and made available to the program, we can use the pigeonhole principle to determine the minimum number of times that we must re-use some part of memory. For example, if my program uses 10×230 and I have 10×230−1 bytes of heap, then at least one byte somewhere will have to be re-used.

So let us suppose that our program which allocates 10 GiB is to be run with a 2 GiB heap and our runtime system provides a garbage collector so we don't have to manage our storage manually. Suppose further that our garbage collector is not continuous, but has some sort of identifiable ‘cycle’ during which it locates re-usable memory. On each cycle the garbage collector can reclaim at most 2 GiB, so we know the runtime system must perform a minimum of at least five garbage collection cycles. It may perform many, many more than five, but it cannot perform fewer because it would have to recycle more than 100% of the heap at some point.

Another way to state this is that the number of garbage collections times the size of the heap must be equal to or greater than the total amount of storage allocated by a program. Here is a graph that shows the minimum number of collections needed for a program that allocates 50 bytes of storage.
The discrete jumps in the graph are because we aren't considering partial allocations (less than 1 byte). When working with realistic heap sizes, we can approximate this graph by dividing the total allocation by the heap size. The green line in this chart plots 50/x.

More to come...