Wednesday, October 26, 2011

Garbage collection gives us the illusion of nearly unlimited memory for temporary storage. To pull off this illusion, we must be able to satisfy every allocation request (at least up to the point where we throw up our hands and raise an “out of memory” condition). In other words, there must be free or unused memory available whenever we need some. Since the point of automatic memory management is to free the programmer from having to think about it, we expect to be able to allocate temporary storage at any time we wish during program execution.

A data structure can be considered “in use” if modifications to it can change the future behavior or result of a program. Since we cannot in general statically determine the future behavior, we compute the transitive closure of the data structures under the data access primitives. This is a superset of the data “in use”. That is, if you can't access the data, it cannot be “in use”.

In order for garbage collection to succeed as a storage management strategy, there must be at all times (anytime we want to allocate) some storage that is either unallocated or unreachable. The size of the heap must always exceed the size of the reachable data. Or we could say that the amount of reachable data must at all times be less than the size of the heap. (By ‘size of the heap’ I mean the total amount of storage both in use and free. For example, I could have a five megabyte heap, but only have 3 words in use.)

Any allocated data structure has but one of two ultimate fates: it remains reachable for the remainder of the run of the program or it becomes unreachable at some point (and we may re-use the storage). Assuming the heap starts empty, the size of reachable storage at any point in time is the difference between the total amount of all allocation and the total amount of all storage that has become unreachable up to that point. For example, if my program allocates sixty megabytes over its lifetime and it abandons fifty-nine megabytes, then the heap must contain the remaining megabyte.

We now consider the differential case, which is to say, the rates of allocation, collection, and heap growth or shrinkage. If the rate of allocation exceeds the rate of collection, the amount of storage in use in the heap must grow to meet the demand. Conversely, if the amount of reachable storage in the heap is decreasing, then we are abandoning storage at a faster rate than we are allocating it. But there are hard boundaries on the size of the heap. We can never have a negative amount of storage in use, and we have the requirement (see above) that the heap must be larger than the reachable storage at all times. The heap size is limited, so the rate of increase in reachable storage must be quite small over the long term if we are not to run out of heap.

The rate of storage allocation must be nearly equal to the rate of deallocation (or abandonment) almost all the time.

(This is only slightly different from the “steady-state” assumption that the rates are essentially equal. We're simply admitting the possibility that the reachable storage grows indefinitely and that the program would eventually crash if it doesn't finish first.)