In McCarthy's early papers on Lisp, he notes that he needs a modified version of subst which needs to be aware of quoted expressions (and avoid substituting within them). He would also need a subst that was aware of lambda expressions. It would have to avoid substituting within the lambda if the name substituted matches one of the bound variables. To be useful for evaluation, it will have to deal with accidental variable capture when substituting within a lambda.
The root problem is that expressions are actually structured objects, but we are working with the list representation of those objects. Instead of substituting by operating on objects, we substitute on the list representation. We have to arrange for the syntactic substitution on the list representation to preserve the semantics of substitution on the objects they represent.
In the substitution model, we take a symbolic expression and replace some of the atoms in the expression with other expressions. We first need a way to discriminate between the different kinds of expressions. An expression is either an atomic symbol, or a list of expressions called an application. There are no other kinds of expressions.
(defun expression-dispatch (expression if-symbol if-application) (cond ((symbolp expression) (funcall if-symbol expression)) ((consp expression) (funcall if-application expression)) (t (error "~s is not an expression." expression))))Substitution is straightforward:
(defun xsubst (table expression) (expression-dispatch expression (lambda (symbol) (funcall table symbol #'identity (constantly symbol))) (lambda (subexpressions) (map 'list (lambda (subexpression) (xsubst table subexpression)) subexpressions)))) * (let ((table (table/extend (table/empty) 'x '(* a 42)))) (xsubst table '(+ x y))) (+ (* A 42) Y)We need a table of multiple substitutions so that we can substitute in parallel:
* (let ((table (table/extend (table/extend (table/empty) 'x 'y) 'y 'x))) (xsubst table '(+ x y))) (+ Y X)
So far, so good. Let's add lambda expressions. First, we need to add a new expression kind:
(defun expression-dispatch (expression if-symbol if-lambda if-application) (cond ((symbolp expression) (funcall if-symbol expression)) ((consp expression) (cond ((eq (car expression) 'lambda) (funcall if-lambda (cadr expression) (caddr expression))) (t (funcall if-application expression)))) (t (error "~s is not an expression." expression))))
Substitution within a lambda expression is a bit tricky. First, you don't want to substitute a symbol if it is one of the bound variables of the lambda expression. Second, substituting a symbol may introduce more symbols. We don't want the new symbols to be accidentally captured by the bound variables in the lambda. If any new symbol has the same name as a bound variable, we have to rename the bound variable (and all its occurrances) to a fresh name so that it doesn't capture the new symbol being introduced. We'll need a helper function
(defun free-variables (expression) (expression-dispatch expression (lambda (symbol) (list symbol)) (lambda (bound-variables body) (set-difference (free-variables body) bound-variables)) (lambda (subexpressions) (fold-left #'union '() (map 'list #'free-variables subexpressions)))))Now when we substitute within a lambda, we first find each free variable in the lambda, look it up in the substitution table, and collect the free variables of the substituted value:
(map 'list (lambda (var) (funcall table var #'free-variables (constantly '()))) (free-variables expression))This gives us the new free variables for each substitution. The union of all of these is the set of all the new free variables
(fold-left #'union '() (map 'list (lambda (var) (funcall table var #'free-variables (constantly '()))) (free-variables expression)))We have to rename the bound variables that are in this set:
(intersection bound-variables (fold-left #'union '() (map 'list (lambda (var) (funcall table var #'free-variables (constantly '()))) (free-variables expression))))So we make a little table for renaming:
(defun make-alpha-table (variables) (fold-left (lambda (table variable) (table/extend table variable (gensym (symbol-name variable)))) (table/empty) variables)) (let ((alpha-table (make-alpha-table (intersection bound-variables (fold-left #'union '() (map 'list (lambda (var) (funcall table var #'free-variables (constantly '()))) (free-variables expression))))))) …)We rename the bound variables as necessary:
(make-lambda (map 'list (lambda (symbol) (funcall alpha-table symbol #'identity (constantly symbol))) bound-variables) …)Finally, we redact the bound variables from the substitution table and append the alpha-table to make the substitutions we need for the lambda body
(make-lambda (map 'list (lambda (symbol) (funcall alpha-table symbol #'identity (constantly symbol))) bound-variables) (xsubst (table/append alpha-table (table/redact* table bound-variables)) body))))The entire definition of
xsubst
is now this:(defun xsubst (table expression) (expression-dispatch expression (lambda (symbol) (funcall table symbol #'identity (constantly symbol))) (lambda (bound-variables body) (let ((alpha-table (make-alpha-table (intersection bound-variables (fold-left #'union '() (map 'list (lambda (var) (funcall table var #'free-variables (constantly '()))) (set-difference (free-variables body) bound-variables))))))) (make-lambda (map 'list (lambda (symbol) (funcall alpha-table symbol #'identity (constantly symbol))) bound-variables) (xsubst (table/append alpha-table (table/redact* table bound-variables)) body)))) (lambda (subexpressions) (make-application (map 'list (lambda (subexpression) (xsubst table subexpression)) subexpressions)))))This is certainly more complicated than simple substitution, but we can see it does the right thing here:
* (xsubst (table/extend (table/empty) 'x '(* a y)) '(lambda (y) (+ x y))) (LAMBDA (#:Y234) (+ (* A Y) #:Y234))
It should be obvious how to add quoted forms. This would require adding a new
kind of expression to expression-dispatch
and a new handling clause in
xsubst
that avoids substitution.
I'm not completely happy with how we've added lambda expressions to
the expression syntax. Using the symbol lambda
as a
syntactic marker for lambda expressions causes problems if we also want to use
that symbol as an argument or variable. Initially, it seems
reasonable to be able to name an argument “lambda”.
Within the body of the function, references to the variable lambda
would refer to that argument. But what about references in the
operator position? By defining lambda expressions as three element
lists beginning with the symbol lambda
we've made it
ambiguous with two-argument applications whose operator is
the variable lambda
. We have to resolve this
ambiguity. The current behavior is that we always interpret the
symbol lambda
as a syntactic marker so you simply
cannot use a variable named lambda
as a function.
It's a start for a code walker, isn't it ?
ReplyDelete