Sunday, January 24, 2016

Locality and GC

I was curious about how long it took to CDR down a list. I wrote a little program that adds up the elements in a list
and tried it out.

(define test-list (iota (expt 10 6)))

(define (sum-list list)
  (let loop ((total 0)
	     (tail  list))
    (if (pair? tail)
	(loop (+ total (car tail)) (cdr tail))
	(begin (if (not (null? tail))
		   (error:not-list list 'sum-list))
	       total))))

(define (test-sum-list n)
   (do ((i 0 (+ i 1))
        (answer '() (sum-list test-list)))
       ((not (< i n)) answer)))

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4280 (4280 RUN + 0 GC); real time: 4281
;Value: 499999500000
But after garbage collection, the performance dropped a bit.
1 ]=> (gc-flip)

;GC #18 15:18:59: took:   0.13   (0%) CPU,   0.13   (0%) real; free: 246114893
;Value: 246114893

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4490 (4490 RUN + 0 GC); real time: 4483
;Value: 499999500000
Now it's taking 4.49 ms.

The garbage collector in MIT Scheme is a very simple, stop the world, copying collector. The GC algorithm traverses the heap in breadth-first order. An undesired effect of this is data structures being spread about the heap. When I first built the test-list, the cons cells were close together. Here are the first few addresses:
1 ]=> (do ((tail test-list (cdr tail))
                (i 0 (+ i 1)))
               ((not (< i 10)) #f)
             (display (object-datum tail))
             (newline))
253936080
253936064
253936048
253936032
253936016
253936000
253935984
253935968
253935952
253935936
;Value: #f
But after garbage collection...
1 ]=> (gc-flip)

;GC #19 15:33:28: took:   0.15   (1%) CPU,   0.16   (0%) real; free: 246114593
;Value: 246114593

1 ]=> (do ((tail test-list (cdr tail))
              (i 0 (+ i 1)))
             ((not (< i 10)) #f)
           (display (object-datum tail))
           (newline))
194190168
194326376
194416512
194499936
194555336
194584992
194609544
194631760
194652360
194669208
;Value: #f
All the cons cells are on different pages.

This is annoying, so I hacked the GC so that it transports chains of cons-cells eagerly. If we do the same experiment, before the flip,

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4280 (4280 RUN + 0 GC); real time: 4280
;Value: 499999500000
And after.
1 ]=> (gc-flip)

;GC #5 15:38:53: took:   0.23   (2%) CPU,   0.24   (0%) real; free: 249429521
;Value: 249429521

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4100 (4100 RUN + 0 GC); real time: 4100
;Value: 499999500000
Now we see a performance increase. If we look at the test list,
1 ]=> (do ((tail test-list (cdr tail))
           (i 0 (+ i 1)))
          ((not (< i 10)) #f)
        (display (object-datum tail))
        (newline))
193516792
193516808
193516824
193516840
193516856
193516872
193516888
193516904
193516920
193516936
;Value: #f

You can see that the cons-cells addresses are next to each other.