Friday, October 9, 2009

Establishing a baseline

It turned out to be harder to establish a baseline than I expected. I decided to use the sboyer benchmark because it a ‘classic’ and it does a bunch of lispy stuff. If I were writing a paper or something I'd use a whole bunch of benchmarks, but I'm just doing this casually.

One problem is to find a benchmark that runs long enough to be interesting, but not so long as to be tedious. When I'm on the shuttle bus running my laptop in power-saver mode with jrmscheme running under the debugger, sboyer can take almost twelve minutes to complete. But when I'm at home running the same laptop in ‘performance’ mode with an optimized compile of jrmscheme outside of the debugger, it takes about six seconds. Benchmarking under the debugger is pointless, of course, but when I'm running in debug mode I have a lot of extra code instrumenting the interpreter, and this is useful to see exactly which optimizations are doing what at the low level.

I know that computers these days exhibit variability in benchmark timings, but the amount of variation was higher than I hoped for. In two back-to-back runs I got timings of 18.81 seconds in the first, 14.03 in the second (no, it wasn't a `warm cache'. The third run was 17.1 seconds). This means that I have to perform a fair number of runs to characterize the performance.

This got me thinking, so I'm going to cut this post short. I'll have more to say later...