Wednesday, October 21, 2009

Now that flat environments are working fairly good, the bottleneck has shifted. The top three items on the ‘top of stack’ histogram are the primitive procedures CAR, PAIR?, and NULL?. Let's look at the code for PrimitiveCombination1:
 
public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Evaluate the argument.
    Control unev0 = this.arg0;
    Environment env = environment;
    object ev0;
    while (unev0.EvalStep (out ev0, ref unev0, ref env)) { };
    if (ev0 == Interpreter.UnwindStack) {
        ((UnwinderState) env).AddFrame (new PrimitiveCombination1Frame0 (this, environment));
        answer = Interpreter.UnwindStack;
        environment = env;
        return false;
    }

    // Call the primitive.
    if (this.method (out answer, ev0)) {
        TailCallInterpreter tci = answer as TailCallInterpreter;
        if (tci != null) {
            answer = null; // dispose of the evidence
            // set up the interpreter for a tail call
            expression = tci.Expression;
            environment = tci.Environment;
            return true;
        }
        else
            throw new NotImplementedException ();
    }
    else return false;
}
The method for CAR is this:
public static bool PrimitiveCar (out object answer, object arg0)
{
    answer = ((Cons) arg0).Car;
    return false;
}
There's quite a bit of noise in that code, so here is the main code path:
bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Evaluate the argument.
    Control unev0 = this.arg0;
    Environment env = environment;
    object ev0;
    while (unev0.EvalStep (out ev0, ref unev0, ref env)) { };
    if (ev0 == Interpreter.UnwindStack) { ... }

    // Call the primitive.
    if (this.method (out answer, ev0)) { ... }
    else return false;
}
The while statement is the tail-recursion trampoline. The immediately following conditional is there to support first-class continuations. The method is expected to stuff its result in answer and return false, unless it needs to make a tail-recursive call, in which case it returns true.

This is pretty general, and it has to be if we are going to support primitives like call-with-current-continuation, but the number one primitive procedure is CAR, and we can handle that one a bit more efficiently. The first thing we need to do is inline the call to CAR:
bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Evaluate the argument.
    Control unev0 = this.arg0;
    Environment env = environment;
    object ev0;
    while (unev0.EvalStep (out ev0, ref unev0, ref env)) { };
    if (ev0 == Interpreter.UnwindStack) { ... }

    // Attempt to cast.
    Cons theCell = ev0 as Cons;
    if (theCell == null) {
        ... enter error handler ...
        }
    else {
        answer = theCell.Car;
        return false;
    } 
}
This avoids several operations. We no longer push ev0 as an argument just to pop it off in the primitive, and we no longer return a flag for a conditional branch. This is about half of the work.

The other half is in evaluating the argument to CAR. The debug version shows that the argument to CAR is usually bound in an argument position in the enclosing lambda. If that is the case, then there is no need for the tail-recursion trampoline or the continuation handling code. We can just fetch the argument.
bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Attempt to cast.
    Cons theCell = environment.ArgumentValue(this.argumentOffset) as Cons;
    if (theCell == null) {
        ... enter error handler ...
        }
    else {
        answer = theCell.Car;
        return false;
    } 
}
If the primitive cannot throw an error (for example, PAIR?), it is even simpler:
public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    answer = environment.ArgumentValue (this.offset) is Cons;
    return false;
}
This code takes almost a negligable amount of time.

Unfortunately, specializing on the primitive procedure and the argument type like this requires a lot of code. Each one-argument primitive can be specialized in at least six different ways, and each way is its own separate class. C# does not have macros and templates don't quite work for this sort of thing. The alternative is some code-generation mechanism, but I've been too lazy to automate that (I've been expanding these things by hand). On the other hand, the unspecialized mechanism is not unreasonable if it isn't called to often, so by specializing only a handful of the top primitives we get a lot of performance improvement for very little work.

MIT-Scheme has reflective operations for manipulating its own SCode. In order to maintain compatiblity with these, the specialized primitives inherit from PrimitiveCombination1. The code for EvalStep is overridden, but we retain the rest of the class. This allows the Scheme level to reflect on this code as if it were unoptimized code. A good example of this is the pretty printer. When it encounters an optimized PrimitiveCarA (primitive CAR of an argument), it treats it just like a PrimitiveCombination1 with an operator of CAR and an argument of some Variable.

Just by optimizing CAR, CDR, PAIR?, and NULL?, the median time for sboyer drops to 2.62 seconds.