Many years ago I worked on a language called REBOL. REBOL was
notable in that it used a variation of Polish notation. Function
names came first, followed by the arguments in left to right order.
Parentheses were generally not needed as the subexpression
boundaries could be deduced from the arguments. It’s a bit
complicated to explain, but pretty easy to code up.
An interpreter environment will be a lists of frames, and each frame
is an association list of variable bindings.
(defun lookup (environment symbol)
(cond ((consp environment)
(let ((probe (assoc symbol (car environment))))
(if probe
(cdr probe)
(lookup (cdr environment) symbol))))
((null environment) (error "Unbound variable."))
(t (error "Bogus environment."))))
(defun extend-environment (environment formals values)
(cons (map ’list #’cons formals values) environment))
define
mutates the topmost frame of the environment.
(defun environment-define! (environment symbol value)
(cond ((consp environment)
(let ((probe (assoc symbol (car environment))))
(if probe
(setf (cdr probe) value)
(setf (car environment) (acons symbol value (car environment))))))
((null environment) (error "No environment."))
(t (error "Bogus environment."))))
We’ll use Lisp procedures to represent REBOL primitives. The
initial environment will have a few built-in primitives:
(defun initial-environment ()
(extend-environment
nil
’(add
lessp
mult
print
sub
sub1
zerop)
(list #’+
#’<
#’*
#’print
#’-
#’1-
#’zerop)))
A closure is a three-tuple
(defclass closure ()
((arguments :initarg :arguments :reader closure-arguments)
(body :initarg :body :reader closure-body)
(environment :initarg :environment :reader closure-environment)))
An applicable object is either a function or a closure.
(deftype applicable () ’(or closure function))
We need to know how many arguments a function takes. We keep a
table of the argument count for the primitives
(defparameter +primitive-arity-table+ (make-hash-table :test #’eq))
(eval-when (:load-toplevel :execute)
(setf (gethash #’* +primitive-arity-table+) 2)
(setf (gethash #’< +primitive-arity-table+) 2)
(setf (gethash #’+ +primitive-arity-table+) 2)
(setf (gethash #’- +primitive-arity-table+) 2)
(setf (gethash #’1- +primitive-arity-table+) 1)
(setf (gethash #’print +primitive-arity-table+) 1)
(setf (gethash #’zerop +primitive-arity-table+) 1)
)
(defun arity (applicable)
(etypecase applicable
(closure (length (closure-arguments applicable)))
(function (or (gethash applicable +primitive-arity-table+)
(error "Unrecognized function.")))))
REBOL-EVAL-ONE
takes a list of REBOL expressions and
returns two values: the value of the leftmost expression in the
list, and the list of remaining expressions.
(defun rebol-eval-one (expr-list environment)
(if (null expr-list)
(values nil nil)
(let ((head (car expr-list)))
(etypecase head
((or number string) (values head (cdr expr-list)))
(symbol
(case head
(define
(let ((name (cadr expr-list)))
(multiple-value-bind (value tail) (rebol-eval-one (cddr expr-list) environment)
(environment-define! environment name value)
(values name tail))))
(if
(multiple-value-bind (pred tail) (rebol-eval-one (cdr expr-list) environment)
(values (rebol-eval-sequence (if (null pred)
(cadr tail)
(car tail))
environment)
(cddr tail))))
(lambda
(values (make-instance ’closure
:arguments (cadr expr-list)
:body (caddr expr-list)
:environment environment)
(cdddr expr-list)))
(otherwise
(let ((value (lookup environment head)))
(if (typep value ’applicable)
(rebol-eval-application value (cdr expr-list) environment)
(values value (cdr expr-list)))))))))))
If the leftmost symbol evaluates to something applicable, we
find out how many arguments are needed, gobble them up, and apply
the applicable:
(defun rebol-eval-n (n expr-list environment)
(if (zerop n)
(values nil expr-list)
(multiple-value-bind (value expr-list*) (rebol-eval-one expr-list environment)
(multiple-value-bind (values* expr-list**) (rebol-eval-n (1- n) expr-list* environment)
(values (cons value values*) expr-list**)))))
(defun rebol-eval-application (function expr-list environment)
(multiple-value-bind (arglist expr-list*) (rebol-eval-n (arity function) expr-list environment)
(values (rebol-apply function arglist) expr-list*)))
(defun rebol-apply (applicable arglist)
(etypecase applicable
(closure (rebol-eval-sequence (closure-body applicable)
(extend-environment (closure-environment applicable)
(closure-arguments applicable)
arglist)))
(function (apply applicable arglist))))
Evaluating a sequence is simply calling rebol-eval-one
over and over until you run out of expressions:
(defun rebol-eval-sequence (expr-list environment)
(multiple-value-bind (value expr-list*) (rebol-eval-one expr-list environment)
(if (null expr-list*)
value
(rebol-eval-sequence expr-list* environment))))
Let’s try it:
(defun testit ()
(rebol-eval-sequence
’(
define fib
lambda (x)
(if lessp x 2
(x)
(add fib sub1 x
fib sub x 2))
define fact
lambda (x)
(if zerop x
(1)
(mult x fact sub1 x))
define fact-iter
lambda (x answer)
(if zerop x
(answer)
(fact-iter sub1 x mult answer x))
print fib 7
print fact 6
print fact-iter 7 1
)
(initial-environment)))
CL-USER> (testit)
13
720
5040
This little interpreter illustrates how basic REBOL evaluation
works. But this interpreter doesn’t support iteration. There
are no iteration special forms and tail calls are not “safe
for space”. Any iteration will run out of stack for a large
enough number of repetition.
We have a few options:
- choose a handful of iteration specail forms
like
do
, repeat
, loop
, for
, while
, until
etc.
- invent some sort of iterators
- make the interpreter tail recursive (safe-for-space).
It seems a no brainer. Making the interpreter tail recursive
doesn’t preclude the other two,. In fact, it makes them
easier to implement.
To effectively support continuation passing style, you need
tail recursion. This alone is a pretty compelling reason to support
it.
But it turns out that this is easier said than done. Are you a
cruel TA? Give your students this interpreter and ask them to make
it tail recursive. The problem is that key recursive calls in the
interpreter are not in tail position. These are easy to identify,
but you’ll find that fixing them is like flattening a lump in a
carpet. You’ll fix tail recursion in one place only to find your
solution breaks tail recursion elsewhere.
If our interpreter is written in continuation passing style, it
will be syntactically tail recursive, but it won’t be
“safe for space” unless the appropriate continuations
are η-reduced. If we look at the continuation passing style
version of rebol-eval-sequence
we’ll see a
problem:
(defun rebol-eval-sequence-cps (expr-list environment cont)
(rebol-eval-one-cps expr-list environment
(lambda (value expr-list*)
(if (null expr-list*)
(funcall cont value)
(rebol-eval-sequence-cps expr-list* environment cont)))))
We cannot η-reduce the continuation. We cannot make this
“safe for space”.
But the continuation contains a conditional, and one arm of the
conditional simply invokes the containing continuation, so we can
η-convert this if we unwrap the conditional. We’ll do
this by passing two continuations to rebol-eval-one-cps
as follows
(defun rebol-eval-sequence-cps (expr-list environment cont)
(rebol-eval-one-cps expr-list environment
;; first continuation
(lambda (value expr-list*)
(rebol-eval-sequence-cps expr-list* environment cont))
;; second continuation, eta converted
cont))
rebol-eval-one-cps
will call the first continuation if
there are any remaining expressions, and it will call the second
continuation if it is evaluating the final expression.
This intepreter, with the dual continuations
to rebol-eval-one-cps
, is safe for space, and it will
interpret tail recursive functions without consuming unbounded stack
or heap.
But we still have a bit of an implementation problem. We’re
allocating an extra continuation per function call. This
doesn’t break tail recursion because we discard one of the
continuations almost immediately. Our continuations are not allocated
and deallocated in strict stack order anymore. We cannot easily
convert ths back into a stack machine implementation.
To solve this problem, I rewrote the interpreter using Henry
Baker’s Cheney on the M.T.A technique where the
interpreter functions were a set of C functions that tail called
each other and never returned. The stack would grow until it
overflowed and then we’d garbage collect it and reset it. The
return addresses pushed by the C function calls were ignored.
Instead, continuation structs were stack allocated. These contained
function pointers to the continuation. Essentially, we would pass
two retun addresses on the stack, each in its own struct. Once the
interpreter figured out which continuation to invoke, it would
invoke the function pointer in the struct and pass a pointer to the
struct as an argument. Thus the continuation struct would act as a
closure.
This technique is pretty portable and not too bad to implement, but
writing continuation passing style code in portable C is tedious.
Even with macros to help, there is a lot of pointer juggling.
One seredipitous advatage of an implementation like this is that
first-class continuations are essentially free. Now I’m not
wedded to the idea of first-class continuations, but they make it
much easier to implement error handling and advanced flow control,
so if you get them for free, in they go.
With it’s Polish notation, tail recursion, and first-class
continuations, REBOL was described as an unholy cross between TCL
and Scheme. “The result of Outsterhout and Sussman meeting in
a dark alley.”
Current versions of REBOL use a simplified interpreter that does
not support tail recursion or first-class continuations.