Thursday, October 8, 2009

Incremental definitions

Kbob asked me: Incremental definitions. Just to be clear, you mean symbols defined inside a lambda?

(lambda (a b)
  (define c ...)
c is an incremental definition?

That's a really good question. In Scheme, internal definitions are transformed into a letrec form, so the example above would be turned into:
(lambda (a b)
  (letrec ((c <compute value for c>))


(lambda (a b)
  (let ((c <unbound>))
    (let ((temp <compute value for c>))
      (set! c temp))


(lambda (a b)
  ((lambda (c)
     ((lambda (temp) (set! c temp)) <compute value for c>)
So internal defines will end up as lambda variables.

We only create environments when we evaluate lambda expressions, and all the necessary variables should be in the parameter list. The only way to add a binding to an environment that wasn't in the parameter list when the environment was created is to evaluate code that wasn't there when the environment was created. There is really only one way to do this, and that is to use eval. Although eval itself is not commonly used, the read-eval-print loop and load are. Both of these need to use eval (or an equivalent).

There are . different strategies for dealing with incrementals:
  1. Disallow eval - Use a “closed-world” model in which code cannot be evaluated and loaded at runtime. There can be no incrementals in this model. A REPL would have to be implemented as a meta-circular evaluator.
  2. Restrict eval - Do not permit define expressions to be evaluated. A REPL and load would be a problem, but this could be an option in a limited debugger.
  3. Restrict access to environments - There are certain distinguished standard environments that can be used with eval. These can be specially constructed to support incremental definitions. If there is no mechanism for gaining access to the environments created by applying a closure, then <normal> environments would not need incrementals.
  4. Support the-environment - Early versions of Scheme had the special form the-environment that would return the current environment to the user as a first-class object. The returned environment (and all the intermediate environments up to the global environment) would have to support incremental definitions, but otherwise they would not be necessary. Fortunately, it is simple to examine the code at eval time to see if there is a call to the-environment within it. If there is not, then there is no need for incrementals.
  5. Go wild - Have a primitive procedure that can extract the environment from an arbitrary closure object and allow this environment to be passed to eval. All environments must support incremental definitions because there is no way to predict if they would be necessary.

Back in the day, MIT Scheme chose option 5. The primitive procedure closure-environment would extract the environment object from a closure, and you could call eval with that object. The special form the-environment was also supported. Unfortunately, this means that all environments must be constructed in such a way that they can be manipulated by the interpreter. Furthermore, it means that all variable lookup must be done by deep searching the environment chain.

By the time the MIT Scheme compiler was written, however, it was realized that arbitrary evaluation in any environment had more disadvantages than advantages, so the MIT Scheme compiler uses option 4. If you write code that uses the-environment, the compiler will invoke the interpreter on that code. (This is so the compiler doesn't have to know anything about the interpreter implementation except the entry point. You don't want to have to maintain two separate compatible copies of the environment code.) If you don't use the-environment, the compiler is free to do what it wants. The closures created by the compiler cannot be destructured with closure-environment, but the compiler does emit debugging information to allow you to inspect what is left of the environment (if anything) once the compiler has optimized the code. The MIT-Scheme debugger uses option 2 to somewhat simulate the effect of evaluating code within the debugger.

One of the fun things about Lisp and Scheme is exploring the basement. MIT-Scheme has ‘subprimitives’ that directly manipulate the underlying memory. If you don't know what you're doing, you can easily corrupt memory and crash the system, but the system uses these to bootstrap itself. In the cold load sequence for MIT Scheme there is this interesting function:
(define (*make-environment parent names . values)
  ((ucode-primitive system-list-to-vector)
   (ucode-type environment)
   (cons ((ucode-primitive system-pair-cons)
          (ucode-type procedure)
          ((ucode-primitive system-pair-cons) (ucode-type lambda)
This creates a first-class environment structure with names and values by constructing tagged pointers to raw data. It is constructed to appear as if it were created by invoking a lambda expression with an unspecific body. This is used to construct the initial top-level environments for the REPL. In packag.scm, you'll find this:
(define null-environment
  ((ucode-primitive object-set-type)
   ((ucode-primitive object-type) #f)
   (fix:xor ((ucode-primitive object-datum) #F) 1)))
This creates a magic object that is recognized as the root of an environment chain.

My version of MIT-Scheme (call it jrm-scheme), interprets the MIT-Scheme SCode without modification, so it boots and runs with the code above. By default, I have to build environment structure that is compatible with the MIT-Scheme interpreter because the Scheme code sometimes examines the structure reflectively. But the point wasn't to make a slavish re-implementation, but to explore the implementation possibilities under real-world constraints. So the next few posts are going to discuss I implemented environments.


kbob said...

Thanks for the clarification.

I'm finding the whole exposition very interesting. I'm in the middle of designing my own Scheme from scratch, and while I'm not currently thinking about environment implementations, it's still good stuff.

My current focus is macro expansion. jrm-scheme probably inherits a working expander from mit-scheme, but I'm trying to implement my own.

icosahedron said...

Welcome to the club kbob. I too am working on my own scheme, one specializing in embedded systems.

This is the best explanation of first class environments and what their disadvantages are I've seen to date.