Thursday, October 9, 2008

Now that we have static environments, we can take advantage of the known locations of lexical variables. For instance, if the variable is bound in the immediately enclosing lambda, we can simply fetch it because it cannot be shadowed. If a variable is completely free in a fragment of code, that is, if it is a `global' or `top-level' variable, we can grab the value cell at load time and simply dereference it to evaluate the variable.

Before loading or evaluating a piece of SCode, we make a pre-pass over it to `compile' the variable lookup. (We could do this at lambda-construction time, but then we'd have to make multiple passes over deeply nested lambda expressions. Nonetheless, multiple passes are much simpler to understand. When contructing a lambda, we'd walk the body and replace all the lambda-bound variables that are free in the body with variables that `knew' their lexical depth. And it is unlikely that the lexical depth will exceed a very small number, so the multiple passes wouldn't add up. On the other hand, you would still need a final pass just before evaluation, and if you were too simple-minded about multiple passes you'd end up with an exponential growth factor.)

As I was saying, we make a pre-pass over the code to `compile' the variable lookup. We create a LexicalMap that holds a model of the runtime environment, and as we walk the code, we simulate the variable lookup against the lexical map. As we simulate the lookup, we note how deep we have to search, whether there will be incremental bindings in any of the frames, and where we ultimately find the variable (if at all). If we find the variable, we look at the path we used to get at it and create an instance of a specialized variable class that performs that particular style of lookup at runtime. If we don't find the variable, we create an instance of a specialized variable class that performs a deep search.

Wednesday, October 8, 2008

Look! Up in the environment!

Variable lookup is right up there with continuation management as an important performance bottleneck. In one benchmark, I measured 2.9 billion evaluations of which 991 million (34%) of them were variable lookups. A good chunk of compiler optimizations revolves around getting the variables to where they need to be as quickly as possible and keeping them in easy to find places.

My interpreter uses the standard `environment model' of evaluation where a linked chain of lexical environment chains are maintained during evaluation. Every variable lookup goes through a `deep search' of the frames looking for a binding with a matching name. Needless to say, this is not blazingly fast. There are two parts to solving this. First is to reduce the need for the deep search, second is to optimize the fast path.

MIT Scheme supports first-class environments. The special form the-environment returns the lexical environment at the point of invocation. The returned environment can be used as an argument to eval and that's where the trouble begins. If eval is called, the user is going to expect that the code will run in a context with the same bindings visible as where the environment was captured. This means we cannot move or collapse the environment frames because it may cause changes visible to the user. Even worse, if the user evals a define expression, or loads a file, it may inject bindings that were not previously visible. This means we cannot even cache the locations of the variables. The flexibility of first-class environments comes with a heavy cost. On the other hand, this flexibility is hardly ever used, so we end up paying this cost to no advantage.

So how do we fix this? First, because (the-environment) is the only documented way to obtain a first-class environment, we can walk the SCode and determine whether a first-class environment will be needed. I do this when I construct an SCode lambda expression. If the body contains a call to (the-environment), I construct a StandardLambda object, but if it does not, I construct a StaticLambda object. A StandardLambda, when evaluated, constructs a StandardClosure. A StandardClosure, when applied, constructs a StandardEnvironment. A StandardEnvironment must be layed out exactly as specified by the source code and variable lookups must do a deep search because the user might inject bindings via define or load.

A StaticEnvironment however, has an important property: because the user cannot use load or eval with it (he can't name the object to pass it as an argument, so this does not create a user-visible restriction), we know that there will not be incremental definitions that could hide the bindings in the environment. We know the lexical depth and offset of all variables bound in a StaticEnvironment.

So the first task is to identify and separate StandardLambda and StaticLambda instances.

Monday, October 6, 2008

Weird bug

I found (to some extent) the bug that has been vexing me for the past couple of weeks. It is obscure and I can't think of a good reason that things would work this way, but I do have a reliable way to avoid the bug.

The symptoms were this: After booting Scheme and getting to the REPL, forms typed in the REPL would be incorrectly rewritten with uninterned symbols appearing in body of the form. Naturally, these would not match the bindings of the interned symbols appearing in the binding lists, so unbound variables would be raised.

I first changed the way uninterned symbols printed so I could easily identify them. This showed a second symptom: the forms typed to the REPL would not be rewritten the same each time. Retyping the form to the REPL would give a different answer on the second rewrite.

Despite these problems, enough of Scheme is working to load and initialize the world and get to the prompt. This takes some 6 million evaluations.

A little poking around led me to the guts of the syntaxer. This is the component of MIT Scheme that builds the abstract syntax tree from s-expressions. It has two renaming phases: first, it renames all identifiers to uninterned symbols. This has the effect of alpha-renaming the bound variables so that macro expansion cannot inadvertently capture bindings. After everything is expanded, the second renaming phase renames the indentifiers back to their original names provided that it does not introduce a naming conflict. If there is a naming conflict, a non-conflicting identifier is generated for the name.

The renaming information is kept in a set of hash tables called the rename-database. What appeared to be happening was that during the second phase the code either thought that there was a conflict in renaming the variables in the body of some lambda expressions, or it could not find the entry in the rename-database that contained the original name. Either explanation could account for it. I suspected a problem with hash tables.

Now the first baffling thing is this: most of MIT Scheme is written in Scheme itself, so the hash table implementation is really only dependent on some very trivial arithmetic. Furthermore, since the hash code is computed by calling a primitive procedure that I wrote, I was able to ensure that all symbols would hash to the same bucket, thus turning hash-table lookup into a linear search for symbols. This depends only on checking EQ-ness of symbols.

The CLR has a notion of interned and uninterned immutable strings. These are semantically equivalent to symbols and I used them as such. I thought that perhaps some uninterned strings were being inadvertently interned or that perhaps some strings I thought were interned were not. I added a two-character prefix to my uninterned strings whenever I created them and I added code that checked that if the two characters were present, the symbol was uninterned, and if absent the symbol was interned. Furthermore, I checked that no interned symbol had the prefix and no uninterned symbol was missing the prefix. I performed this check on every call to EQ and on every call to a Scheme primitive that manipulated symbols. I even put a test in the variable lookup code to check the state of any symbol that may have been placed in a variable.

This code showed clearly that I was correctly managing the uninterned and interned symbol sets. No symbol accidentally migrated.

The addition of the prefix characters helped when stepping through the code. It was easy to see which symbols were uninterned and which were interned by simply looking for the prefix. Unfortunately, it is hard to distinguish between two different uninterned symbols with the same name (and there cannot be two interned symbols with the same name). I could see a few cases where two uninterned symbols with the same name were being compared and the answer indicated they were different symbols.

So to make it easier for me to tell which uninterned symbols were which, I changed the Scheme primitive that creates uninterned symbols to add a unique prefix to each one.

The bug went away.

`Aha!' you say, it *was* a case of accidentally confusing two symbols. Well, I *thought* that, too, but it is weirder than that. If giving each uninterned symbol a unique name solves the problem, then we seem to have this kind of bug:

Two symbols that are NOT identical (occupy different locations in memory) are determined to be identical because they contain the same sequence of characters.

Now I suspected a bug of this sort fairly early on in the debugging process, so I was a bit confused why I didn't find it earlier. (After all, it seems obvious)

Here is an obvious fact: the number of uninterned symbols is strictly equal to or less than the number of calls made to the primitive that creates them. There is exactly one routine in my interpreter that creates uninterned symbols, so each one must come from a call to that routine. Of course I had put a breakpoint at that routine so I could see who was generating these symbols.

The call to make-uninterned-symbol happens exactly *once* per identifier. This, too, make sense: you only need one uninterned name per identifier.

So if we are comparing two symbols that are NOT identical, where did the second one come from?

Ah, you say, maybe it was left over from a previous expansion. As it so happens, I instrumented EQ quite heavily because I wanted to be sure I knew what was what. When I use the code that creates each symbol uniquely, I can see what the uninterned symbol is compared against. It is compared against itself and the original interned symbol. Under no circumstances did I see a comparison with any uninterned symbol *other* than itself. That makes sense because I was working with very simple lambda expressions that had only one bound variable.

So if we are creating only *one* uninterned symbol per binding, and only checking against that one uninterned symbol, how could we confuse them?

The explanation I'm stuck with is this: the CLR may not be guaranteeing that uninterned symbols have reference semantics. That is, the `identity' of an uninterned string (as defined as the location in memory where the bits are) might not be preserved, but can even *split* (two strings that were once EQ become not EQ). I can't think of why an implementation would do this, but it does explain what I am seeing. (I'm thinking there might be an explanation involving cross-generational pools of uninterned strings, but I dunno.)

The workaround is trivial: I no longer use uninterned strings as `naked' object but wrap them in a trivial class object. Since class instances are defined to have reference semantics, there is no problem with using object-equality on them.

I suppose I can try to write a program that demonstrates this issue, but I'm getting tired of debugging it and I have a workaround. But this is one weird bug.

Wednesday, October 1, 2008

A curious bug

I've been hacking away on my Scheme system, but I'll give a full update later on. I've encountered a curious bug, though. When I get to the REPL and type (let ((x 3)) x), I get an unbound variable error. It seems that the syntaxer (the part of MIT Scheme that expands macros and converts the s-expressions into s-code) is renaming the bound variable x.

What is curious about this bug is the specific nature of it. Since my version of MIT-Scheme is still incomplete, I often run into missing primitives or as-yet-unimplemented behavior. This just drops me into the C# debugger. But in order to cold-load and initialize Scheme from an empty image, I needed to get a lot of stuff working correctly. Scheme does some 6 million evaluations during the cold load, so I know that large portions of the language work just fine. The unimplemented portions just crash, so it is somewhat baffling that I've encountered something that is so obviously incorrect, yet correct enough to allow booting and even to attempt evaluation.

My guess right now is that there is something slightly wrong with hash tables. It's just a hunch, though. I have tried some simple tests from the REPL, and they look ok. Unfortunately, there is a lot of evaluation going on between hitting the return key and actually running `eval', so I haven't been able to find the problem by tracing the interpretation.