Monday, December 30, 2024

Specialized Environments

A Lisp program conceptually does a lookup every time it evaluates a symbol. Since evaluating symbols is the single most common operation, you want it to be as fast as possible. In compiled code, symbol values might be stored in registers, on the stack, or in a vector, but in interpreted code they are usually stored in a structure called an environment. In compiled code, the symbol value can often be retrieved by a move instruction, but in interpreted code there may be a function call and a vector-ref needed to get the value. This is a significant performance hit.

We can customize the layout of an environment to speed up symbol lookup. In the general case, an environment is a fairly heavyweight data structure. Because you can eval a define form at any time, you need an environment that supports incremental definition and you need to deep search each time you do a lookup because someone might have defined a shadowing variable.

But 99% of the time, you don’t need the general case. If no one captures a first-class environment, then you don’t need to support incremental definitions because no one can call eval to define a new variable. If no one mutates a variable, then you can store a copy of the variable’s value directly in the environment instead of indirectly through a ValueCell object. Most environments only close over a couple of variables, so you can make the variables fields of the environment object instead of storing them in a vector of values. You statically know the number of variables, so you can store them in a fixed-size class instance.

An Environment is an abstract base class.

A GlobalEnvironment is a singleton class that represents bindings in a hash table.

A StandardEnvironment is a concrete subclass of Environment that supports assignment and incremental definitions. The bindings are kept in a vector of ValueCell objects and the incremental bindings are kept in a hash table. This is a general purpose environment.

class StandardEnvironment : public Environment {
  Vector<ValueCell*> bindings;
  HashTable<ValueCell*> incrementalBindings;
};

A variable lookup in a StandardEnvironment involves fetching the vector of value bindings from the environment, looking up the variable in the vector of bindings, and if it is not found, looking it up in the hash table of incremental bindings, and finally fetching the value from the cell. Several levels of indirection are involved.

A StaticEnvironment supports assignment, but not incremental definitions. Bindings are kept in a vector of ValueCell objects.

class StaticEnvironment : public Environment {
  Vector<ValueCell*> bindings;

  Object* GetArg0() { return bindings[0].Value; }

};

Looking up a value in a StaticEnvironment is slightly quicker because there is no table of incremental bindings to search.

An ImmutableEnvironment holds bindings in a vector and does not support assignment or incremental definitions. Binding values are kept directly instead of indirectly through ValueCell objects.

class ImmutableEnvironment : public Environment {
  Vector<Object*> bindings;

  Object* GetArg0() { return bindings[0]; }

};

Looking up a variable in an ImmutableEnvironment is quicker because there are no ValueCell objects to dereference.

A SimpleEnvironment is an abstract subclass of ImmutableEnvironment that does not support assignment or incremental definition. Bindings are kept in class fields.

A SimpleEnvironment0 is a concrete subclass of SimpleEnvironment that holds no bindings — these are created when you invoke a thunk. A SimpleEnvironment1 is a concrete subclass of SimpleEnvironment that holds one binding, and so on up to SimpleEnvironment3.

  class SimpleEnvironment2 : public SimpleEnvironment {
    Object* var0;
    Object* var1;

  Object* GetArg0() { return var0; }

  };

Looking up a variable in an SimpleEnvironment is quicker because you don’t have to indirect through a vector of bindings. A method as simple as GetArg0(), which simply returns a field in the instance, can often be inlined.

Environments are created by applying closures. There are subtypes of closures that correspond to the different kinds of environments. For example, a SimpleClosure2 is a closure that creates a SimpleEnvironment2.

Closures are created by evaluating lambda expressions. There are subtypes of lambda expressions that correspond to the different kinds of closures. For example, a SimpleLambda3, when evaluated, will create a SimpleClosure3.

When S-code lambda object are created, they are analyzed to see if a first-class environment is captured, and if not, whether any of the variables are mutated. The appropriate subclass of lambda object is created based on this analysis.

Almost all environments end up being SimpleEnvironments and are quite a bit faster than the general case.

No comments: