I really liked first-class environments when I was first exposed to them. It was cool to be able to reflect the underlying interpreter data structures to user space. At one point I attempted to port Gary Drescher's Object Lisp to Scheme. In order to handle the dynamic object scope, I used first-class environments and injected the object bindings into the lexical environment when the object was in use. (There are a lot of reasons why this doesn't work.) Over time, however, I came to realise that there was more difficulty and less power with first-class environments than I had originally thought. At this point I believe that first-class environments are useless at best, and dangerous at worst.
Before I get too far, I need to precisely describe what a first-class environment is. In Scheme, all variables are associated with a unique binding. The variable is either ‘free’, in which case it is a reference to a ‘top-level’ or ‘global’ binding, or it is ‘lexical’ in which case there is an enclosing lambda expression that is lexically superior to the reference names it as an argument. A lexical variable becomes ‘bound’ when the (closure containing the) lambda expression is applied. The binding exists as long as any code within the body of the lambda expression could refer to the variable. The ‘environment’ is the collection of bindings necessary to evaluate a given piece of code. (You should all be familiar with this.)
No doubt you are familiar with the ‘chained environment model’. In this model, a closure is made over the current environment every time a lambda expression is evaluated. When the closure is applied, the environment at the time of closing is used as a base, and a new ‘frame’ is created with the new bindings. When a variable is to be evaluated, the frames are searched from most-recent to least-recent to discover the binding. We're already in a little bit of trouble. While this model accurately describes how to determine the correct binding of a variable, it is just a model. The implementation is allowed to do whatever it wants provided that it always returns the same answer as the model would. There are several different implementations of this model. The usual implementation in a toy interpreter is to represent the environment as a simple association list. Bindings are pushed on to the list at application time and the list is searched on each reference. A more sophisticated interpreter may keep a linked list of vector-like ‘frame’ objects. It is simple to keep track of the ‘lexical address’ of a variable. This consists of the count of the number of frames back and the position in the frame where the binding is stored. When the variable is evaluated, no search is necessary. The environment chain is followed until the correct frame is found, and then the binding is at a known offset. Some implementations take advantage of the fact that most environments can be temporarily allocated on the stack in a contiguous region. These implementations can simply compute a static index back from the current stack pointer in a number of cases. Some implementations use a ‘flat’ environment. A flat environment is a vector of the addresses of the bindings needed by the lambda body. When the lambda is closed over, the bindings are copied. Finally, some implementations carefully analyze the body of the lambda expression and decide among one of several environment representations that might work best for the particular body.
The model does not specify what happens to those bindings that are not used within the body of the lambda. For example, in this code:
(let ((greeting "Hello!")) (display greeting) (for-each (lambda (e) (display e) (newline)) (list "How" "are" "you?")))The binding for
greeting
could be used by the lambda
expression passed to for-each
, but it isn't. The model tells us that display
and newline
refer to the top-level definitions, and that e
is immediately bound by the lambda, but it does not tell us what happens to the binding of greeting
after the greeting is displayed. Some implementations retain the binding, others drop it, still others do one or the other at different times.Returning to the question of what a ‘first-class environments’ is, there is the question of whether you should be able to extract one from an arbitrary closure. There are three potential answers:
- Yes, you should always be allowed to extract the environment from any closure.
- No, you must indicate beforehand which environments are to be first-class. (The
the-environment
form in early Scheme implementations and in MIT Scheme is an example.) - Maybe, it depends on the implementation.
- All bindings, in use or not, are captured.
- Only the bindings in use are captured, the others may not be.
- Only the bindings explicitly listed by the user are captured.
- All, some, or none are captured depending on the implementation.
- The actual bindings (mutable and live) are returned.
- A read-only reference to the bindings are returned. Values may be seen to change over time, but they cannot be modified via this interface.
- A snapshot to the bindings are returned. Changes to the actual bindings are not seen.
- The user specifies which variables are live, mutable, or snapshot.
- Implementation dependent.
define
expression in the returned environment.
- Evaluating a
define
establishes a new, shadowing binding if a previous binding did not exist. It acts like an assignment if a previous binding did exist. define
is not to be used. It is an error to try it. (Optionally an error is signalled, etc.)- Implementation dependent.
When someone suggests ‘first-class environments’, I assume they want options 1, 1, 1, and 1, that is, they can grab any environment at any time, all lexical bindings are present, used or not, the bindings are live and mutable, and you can insert new, shadowing bindings. Many people have told me not to make that assumption, so I'll talk about the other variations as well. In this variation, though, the user simply cannot reason about his code. There are no abstraction barriers because any piece of code can, at any time, crack open a closure and change the meaning of any variable whatsoever. Something as simple as
(lambda (x) (+ x 1))
cannot be assumed to do addition if someone injects a shadowing binding for +
. Obviously you cannot compile this to an add
instruction if you don't assume it will still be addition at runtime.Thomas Lord suggested “When you write your code, avoid capturing environments and then you are all set.”
He is apparently suggesting option 2 for the first question: explicit marking of environments you wish to capture. This is a considerably weaker proposal because allows the user to statically analyze any code that doesn't use a first-class environment, and it allows the implementation freedom in choosing environment representations in any code that doesn't use first-class environments. I have few objections to that, but let's examine question 2 under this proposal.
The second question is whether all bindings are visible, or only those bindings that the user explicitly specifies. The latter would take a form something like this:
(the-environment foo bar <more variables here> ...)
. I have a small objection to this. It is poor practice to expose the internal names of your variables (see previous posts). I don't think it useful for Scheme standardization because it is trivially implemented as a macro. (The third option of some bindings being available, some not, is not worth considering. It would be impossible to write portable code that used them because there are no guarantees they exist.)So allow me to summarize my objection to first-class environments:
- If first-class environments can be arbitrarily extracted from any closure, you can no longer depend on lexical scoping. You throw the baby out in favor of the bathwater.
- If first-class environments can only be obtained through use of an explicit special form, and you explicitly enumerate the variables captured, you don't need a change to the standard, you need a SRFI with a macro.
- If first-class environments can only be obtained through use of an explicit special form, but all visible variables captured, you still don't need a change to the standard, you need a SRFI with a more complicated macro.
It isn't clear to me how hygienic macros would work with first-class environments. Recall that if a hygienic macro introduces bindings, the appropriate renaming is performed during transcription to avoid accidental capture. But if we wish to capture one of these bindings, we'll need to be able to refer to it in some way. Code outside the macro would be unhygienic if it could refer to the variable, so that's a problem. Code inside the macro would work fine (it would be renamed appropriately to keep a hold of the reference), but then you don't need a change in the standard to put code inside your macros.
3 comments:
I always find the argument that some language construct is supposedly "dangerous" a bit weird. It's too fuzzy to make such a statement, in my humble opinion. What's more important, I think, is this: Do you want to be able to implement portable runtime debuggers or not? If you want this, you need first-class environments.
What do I need if I want to write a pretty printer for functions? I think I need choices [1,2,2,2], which seems not terribly harmful.
Those choices would let me, for example, serialize functions (along with their bindings) in a way I can later eval them back in, and though they won't be eq to the original ones, have at least a shot at wondering whether they might be, in another weak sense, equal (not, of course, in the more general, non-computable sense). It can also just be useful for debugging of normal code.
Guile provides facilities for this, at least up until the top level environment, which is harder to inspect, but that's actually okay: I don't want to serialize out the meaning of + in most cases.
Is that a target for standardization?
Ten years down the road, I note that Scheme already has global environments (the second argument to eval) which are unfortunately not quite first-class. In R6RS you can create a new global environment and import libraries into it, but the result is immutable; indeed, the only environment guaranteed immutable is plain old (interaction-environment). I have a pre-SRFI that adapts MIT's environment tree to the R6RS world, but works on global environments only.
Post a Comment