Tuesday, December 31, 2024

Usual Integrations, Function Calls and Primitive Calls

Near the beginning of most MIT-Scheme files you'll see (declare (usual-integrations)). This instructs the SF program, which translates Scheme to S-code, to replace the “usual” global variables with their (compile time) values. You lose the ability to change the values of these variables, but who redefines CAR or CDR? (And if you do, just remove the declare form or add CAR and CDR to the exception list.)

Now forms such as (null? x), which used to be syntaxed into this S-code:

[funcall
   [global ’null?]
   [argument 0]]

are now syntaxed into this S-code:

[funcall
   [quote [primitive null?]]
   [argument 0]]

Recall that S-code is a tree representation of Lisp code that is walked by the interpreter. The leaves of the tree are all either quote, global, argument, or lexical, and each of these only take one or two machine instructions to eval. The “overhead” of interpretation involves getting to the leaves.

The internal nodes of an S-code tree are the compound forms: function calls and special forms. IF and PROGN (in Scheme BEGIN) are the two most frequently evaluated special forms. When we construct the S-code for a compound form, we pattern match against the subforms and specialize the S-code. Specialized S-code for a compound form is called a “superoperator”. Superoperators eliminate the eval dispatch for the subforms. The machine code for interpreting a superoperator is close to the equivalent compiled code. Let us examine some of the most important superoperators.

Funcall Argument List Elimination

In general, a function call accumulates a list of arguments and jumps to APPLY to dispatch the function. We can eliminate the overhead of doing this. First, we can eliminate the call to APPLY by having applicable objects implement the applicable interface. We use virtual function dispatch to invoke the code that implements the applicable object. Applicable objects can be applied to any number of arguments passed as a list. We can eliminate the overhead of manipulating short argument lists by spreading the arguments and adding call0, call1, call2, and call3 methods to the applicable interface. The number of arguments at any particular call site is fixed, and it is usually a small number. We eliminate the overhead of accumulating the argument list for a function call by specializing the funcall S-code object on the number of arguments.

An S-code funcall is replaced by a funcall0, funcall1, funcall2, funcall3, or funcalln depending on the number of arguments. The argument accumulation loop is unrolled in the numbered cases placing the arguments in local C# variables. The funcallx superoperators directly call the appropriate calln method in the function’s applicable interface.

Primitive Function Call

Function calls to primitive procedures are the foundation of Lisp and Scheme programs. We can look at a Lisp program as if it were a “black box” that makes a series of primitive function calls. If unoptimized, a Lisp program ought to make the same series of primitive function calls with the same values whether it is interpreted or compiled. The primitive function will run at the same speed no matter how it is called, so if compiled code is faster than interpreted, it is issuing the primitive function calls faster, taking less time between them. The time between issuing primitive function calls is the interpretation overhead, and we want to reduce that.

When we construct an S-code funcall (and funcall0, funcall1, etc.), we check if the thing being called is a literal quoted primitive function, and if it is, we replace the S-code with a primitive-funcall (or primitive-funcall0, primitive-funcall1, etc.). A primitive-funcall evaluates its arguments, but it can skip evaluating the function. It can skip the apply dispatch, too, and just jump right to the primitive entry point.

We construct (null? arg) as

[primitive-funcall1
  [quote [primitive null?]]
  [argument 0]]

The MIT-Scheme interpreter and the MIT-Scheme hardware all special case function calls with small numbers of arguments and those which call primitive procedures.

In the next post, we develop this idea beyond what MIT-Scheme already does.

Eliminating Deep Search

In general, when you look up a lexical variable, you have to do a deep search. You start with the innermost scope and work your way out until you find the variable you are looking for. This takes an absurd amount of time for each variable lookup, so we want to avoid actually doing it.

As mentioned in the previous post, we special case the two most common cases: when the variable is in the innermost scope and when the variable is in the global scope. In the other cases, when the variable is in an intermediate scope, we can flatten the scopes by copying the variable values from inner scopes to outer scopes whenever the variable is closed over. This only works because we have put the mutable variables in cells and we copy pointers to the cells.

When an S-code lambda object is created, we walk the code and find the variables that are closed over. We create a closure with a copy of the variable’s value (or a copy of its cell). In essence, we replace:

(lambda (x) (+ x y))

(assuming y is a lexical variable) with

(%make-closure ’(lambda (x) (+ x y)) y)

Once we have done this, we no longer need to deep search for lexical variables. There is only one lexical frame, it contains a copy of the variable, and we know the offset of the variable in the frame. Lexical variable lookup is now just a vector-ref.

The drawback is that we have cell-converted the mutable variables, so calling a function that contains mutable variables will allocate memory. Use of SET! makes your code expensive. Furthermore, %make-closure is linear in the number of things being closed over. But typically, closures only close over one or two items.

So between these specializations (this post and the two previous ones), we have removed the overhead of variable lookup. An expression like (lambda (x) (+ y x)) (where y is a lexical variable) is converted into S-code that looks like:

[%make-closure
  [quote (lambda ’(x)
            [funcall
              [global ’+]
              [lexical 0]
              [argument 0]])]
  [argument n]  ;; wherever ’y is in the outer scope
  ]

Evaluating variables is the bulk of the work in a Lisp program. Now both the interpreter and compiler can evaluate a variable with one or two machine instructions. Interpreted code is still slower than compiled code, but this isn’t a bottleneck anymore.

Now we look for other bottlenecks. Watch this space.

Monday, December 30, 2024

Specialized Variables

In general, when you lookup a variable, you have to do a deep search through the lexical contours. This is because someone could have captured a first class environment and evaled a define in it that shadows a variable.

However, this is a very rare case. In the vast majority of cases, the variable is either a global variable or lexical variable bound in the immediately enclosing environment. We can optimize for these cases.

Symbols refer to three different kinds of variable: if the variable is free, it refers to a global or “top-level” variable, if it is bound, it may be bound in the immediately enclosing environment (i.e., it is an argument), or it may be bound in a environment further up the lexical chain. We can determine which of these cases applies statically when we construct the S-code and select the appropriate subtype of S-code variable.

A global or top-level variable can contain a pointer to the value cell of the variable in the global environment, so variable lookup simply involves a dereference. If the variable is lexically bound, and the environment is a StatcEnvironment (or subtype), then the lexical address of the variable is known statically and cannot change (the variable cannot be shadowed), so we can store the lexical address in the S-code variable. Lookup is faster because it involves constant offsets.

The variable corresponding to argument 0 of the current lambda is by far the most commonly accessed variable, followed by argument 1. We can create special S-code types for each of these two variables in j addition to the three types for global, lexical, and the other arguments. By classifying the variables into one of these five types at syntax time, we can greatly reduce the amount of work needed at runtime to access the variable.

So consider the form (lambda (n) (lambda (x) (+ x n))). n is a lexical variable bound one level deep. x is an argument and + is a global. The S-code representation of this form will have three different variable types. The S-code for n will be a LexicalVariable with a depth of 1 and offset of 0. The eval method for LexicalVariable can walk back one level in the lexical chain and then access the zeroth element of the environment.

On the other hand, the S-code for x will be an ArgumentVariable with an argument number of 0. The eval method for ArgumentVariable can call the GetArg0 method of the environment.

Finally, the S-code for + will be a GlobalVariable with a pointer to the global value cell for +. The eval method for this type of S-code variable can simply dereference the pointer.

Between specializing environments and S-code varibles, we can greatly reduce the amount of time needed to access variables.

Specialized Environments

A Lisp program conceptually does a lookup every time it evaluates a symbol. Since evaluating symbols is the single most common operation, you want it to be as fast as possible. In compiled code, symbol values might be stored in registers, on the stack, or in a vector, but in interpreted code they are usually stored in a structure called an environment. In compiled code, the symbol value can often be retrieved by a move instruction, but in interpreted code there may be a function call and a vector-ref needed to get the value. This is a significant performance hit.

We can customize the layout of an environment to speed up symbol lookup. In the general case, an environment is a fairly heavyweight data structure. Because you can eval a define form at any time, you need an environment that supports incremental definition and you need to deep search each time you do a lookup because someone might have defined a shadowing variable.

But 99% of the time, you don’t need the general case. If no one captures a first-class environment, then you don’t need to support incremental definitions because no one can call eval to define a new variable. If no one mutates a variable, then you can store a copy of the variable’s value directly in the environment instead of indirectly through a ValueCell object. Most environments only close over a couple of variables, so you can make the variables fields of the environment object instead of storing them in a vector of values. You statically know the number of variables, so you can store them in a fixed-size class instance.

An Environment is an abstract base class.

A GlobalEnvironment is a singleton class that represents bindings in a hash table.

A StandardEnvironment is a concrete subclass of Environment that supports assignment and incremental definitions. The bindings are kept in a vector of ValueCell objects and the incremental bindings are kept in a hash table. This is a general purpose environment.

class StandardEnvironment : public Environment {
  Vector<ValueCell*> bindings;
  HashTable<ValueCell*> incrementalBindings;
};

A variable lookup in a StandardEnvironment involves fetching the vector of value bindings from the environment, looking up the variable in the vector of bindings, and if it is not found, looking it up in the hash table of incremental bindings, and finally fetching the value from the cell. Several levels of indirection are involved.

A StaticEnvironment supports assignment, but not incremental definitions. Bindings are kept in a vector of ValueCell objects.

class StaticEnvironment : public Environment {
  Vector<ValueCell*> bindings;

  Object* GetArg0() { return bindings[0].Value; }

};

Looking up a value in a StaticEnvironment is slightly quicker because there is no table of incremental bindings to search.

An ImmutableEnvironment holds bindings in a vector and does not support assignment or incremental definitions. Binding values are kept directly instead of indirectly through ValueCell objects.

class ImmutableEnvironment : public Environment {
  Vector<Object*> bindings;

  Object* GetArg0() { return bindings[0]; }

};

Looking up a variable in an ImmutableEnvironment is quicker because there are no ValueCell objects to dereference.

A SimpleEnvironment is an abstract subclass of ImmutableEnvironment that does not support assignment or incremental definition. Bindings are kept in class fields.

A SimpleEnvironment0 is a concrete subclass of SimpleEnvironment that holds no bindings — these are created when you invoke a thunk. A SimpleEnvironment1 is a concrete subclass of SimpleEnvironment that holds one binding, and so on up to SimpleEnvironment3.

  class SimpleEnvironment2 : public SimpleEnvironment {
    Object* var0;
    Object* var1;

  Object* GetArg0() { return var0; }

  };

Looking up a variable in an SimpleEnvironment is quicker because you don’t have to indirect through a vector of bindings. A method as simple as GetArg0(), which simply returns a field in the instance, can often be inlined.

Environments are created by applying closures. There are subtypes of closures that correspond to the different kinds of environments. For example, a SimpleClosure2 is a closure that creates a SimpleEnvironment2.

Closures are created by evaluating lambda expressions. There are subtypes of lambda expressions that correspond to the different kinds of closures. For example, a SimpleLambda3, when evaluated, will create a SimpleClosure3.

When S-code lambda object are created, they are analyzed to see if a first-class environment is captured, and if not, whether any of the variables are mutated. The appropriate subclass of lambda object is created based on this analysis.

Almost all environments end up being SimpleEnvironments and are quite a bit faster than the general case.

Sunday, December 29, 2024

Interpreting S-code

Lisp programs are not text but rather nested lists of symbols (atoms) and other lists. By default, a symbol represents a variable and a list represents a function call, but some lists are special forms that don’t follow the normal function call semantics.

S-code is an alternative representation for Lisp programs. Instead of nested lists, S-code is a set of nested data structures. Each S-code type corresponds to a Lisp form. There is an almost trivial isomorphism between S-code and the list representation of Lisp. S-code is a bit easier to interpret and compile than the list representation because the data structures representing the code can contain auxiliary information to aid in interpretation of the code. For example, a variable reference in S-code can contain the lexical depth and offset of the variable.

Converting from nested lists to S-code is a process called “syntaxing”. The nested lists are walked, the macros are expanded, auxiliary information is collected, and the S-code is generated. The S-code can be directly interpreted or compiled to machine code. “Unsyntaxing” is the process of converting S-code back to nested lists and it basically involves a walk of the data structure, discarding the auxiliary information, and collecting a tree of lists, thus recovering the original Lisp program (modulo macro expansion).

The MIT-Scheme hardware (Scheme 79, Scheme 81, and Scheme 86) directly executed S-code. MIT-CScheme is a C and assembly program that interprets S-code and runs compiled S-code. MzScheme and Racket also use a similar nested data structure represention of Scheme programs. I don’t think they call it “S-code”, but it is essentially the same thing. No doubt other Lisp and Scheme interpreters use this technique.

(The Lisp machine, by contrast, compiled Lisp programs to a byte-code that was interpreted by the microcoded Lisp machine hardware. I believe that MzScheme and Racket now include a JIT compiler.)

A few years back, for kicks, I wrote an S-code interpreter for Scheme. I decided to write it in C# so that I could leverage the .NET Common Language Runtime. I wouldn’t have to write a garbage collector, for example. I wrote it to interpret the S-code generated by MIT-Scheme. This way I could leverage the MIT-Scheme runtime as well. But this interpreter is not simply a port of the C implementation of MIT-Scheme. Many parts of the interpreter work completely differently from the MIT C implementation. Among them:

  • The C# call stack is used to hold the continuations. Subproblem continuations are C# continuations. Nested Scheme function calls appear as nested C# function calls, so the C# debugger can be used to debug Scheme code.
  • Tail recursion is handled by a trampoline at each function call boundary. To tail call a function, you return the function, the arguments, and a special marker to the caller’s trampoline, which will make the call on your behalf.
  • First class continuations are handled by lightweight stack inspection. In addition to a special marker for tail recursion, there is also a special marker for continuation capture. If this is returned, the caller’s trampoline will evacuate its current stack frame from the stack onto the heap and propagate the special marker up the stack.
  • Eval dispatch is handled through virtual method dispatch. Each type of S-code object has an eval method that is specialized for the type. I was curious if the virtual method dispatch was fast enough to be in the main interpreter loop.
  • The S-code interpreter is instrumented to discover the “hot” paths through the code. Certain patterns of S-code that are determined to be dynamically frequent can be replaced by S-code “superoperators” that provide specialized higher performance evaluation paths that avoid recursive evaluation steps.
  • Variable assignments are eliminated through cell conversion. All variables are ultimately immutable. Variables that were assigned to are converted to mutable cells that are allocated on the heap. This means we can use a flattened environment structure, but calls to procedures with mutable variables will cons.
  • The nested environment structure was replaced with flattened environment vectors. Shared immutable lexical variables are copied, and since mutable variables are stored in cells, their references are copied. Lexical variable lookup is constant independent of the lexical depth, but closure construction becomes linear in the number of variables being closed over.

Limitations

C# does not provide a way to dump a heap snapshot, so it isn’t possible to save a Lisp image. Instead, the interpreter is “cold loaded” each time it is started. Indeed, the main problem with the interpreter is the startup time. It spends an inordinate amount of time initializing the unicode tables.

The major version of MIT-Scheme has been incremented. The C# interpreter fails to reach the prompt when cold loading the latest version. I haven’t tracked down the problem.

The C# interpreter assumes that the S-code abstraction is completely opaque, i.e. that the Scheme code knows nothing about the binary layout of S-code objects and only manipulates them through the provided primitives. This isn’t always the case, however, so the C# intepreter sometimes has to recognize that certain idiomatic patterns of S-code are actually attempting to emulate a primitive operation. For example, system-pair-car is a primitive that extracts the first element of any system object that has two elements, i.e. objects that MIT-Scheme implements as a cons cell even though not tagged as one. But the point of using C# was to hide the representation of Scheme objects from the Scheme code, so many objects are no longer implemeted as cons cells. The interpreter notices calls to system-pair-car and instead extracts whatever field of the object that MIT-Scheme would have put in the car.

The interpreter is not complete enough to host the SF program, which creates Scheme .fasl files from Scheme source. You need MIT-Scheme to create the .fasl files, which can then be loaded into the C# interpreter.

Now what?

I got increasingly frustrated with the limitations. It was taking too long to do a full debugging cycle. Playing whack-a-mole with the bugs was getting tedious.

I wanted to understand fundamentally why interpreters are so slow. In theory, shouldn’t the interpreter be able to perform the same operations as the compiled code? What exactly is “interpreter overhead”? Shouldn’t it be able to run in the same ballpark as the compiled code? What would it take to make a high-performance interpreter?

I discovered some of the answers. A large part of the overhead comes from instruction dispatch. Compiled code has its instructions dispatched in hardware and often have dedicated hardware to read and process the instruction stream. Interpreters use software to do the dispatch. You can make the interpreter faster by combining instructions and dispatching instruction combinations. There is a combinatorical explosion of instruction combinations, however, so you only want to combine the most common instruction sequences.

I wrote a number of hand-coded instruction combinations, but it got tedious and I realized that I wanted an automatic way to generate instruction combinations for common instruction sequences. That is, I wanted a JIT compiler. I put the project on the shelf at this point because a JIT compiled S-code interpreter is another project in itself.

I decided to blog a bit about some of the more unusual features of this interpreter, so watch this space if you are interested.

Friday, December 27, 2024

Composing Binary Functions

Functions that take one argument and return one value are cool because you can make pipelines out of them. You just send the output of one function into the input of the next. Of course the output of the first function must be of a type that the second function can accept. Or you can just stick with functions that take and return the same type, and then you can compose these as desired.

But what if you want to compose binary functions that take two arguments and return two values? There are a couple of ways to take multiple arguments: you can pass them in a list (or vector), or you can pass the arguments one at a time by curry the function.

(defun curry (B)
  (lambda (l)
    (lambda (r)
      (funcall B l r))))

There are two lambdas here. The outer lambda accepts the left argument and returns the inner lambda. The inner lambda accepts the right argument and calls the original binary function with both arguments.

To call a curried function, you first call it with the left argument. This will return a lambda that you call with the right argument.

(funcall (funcall curried-binary-function left) right)

The two-argument identity function would be curried like this:

(defun curried-values (left)
  (lambda (right)
    (values left right)))

Composing two curried functions is a little complicated. Let us suppose that F and G are curried binary functions. Further assume that G has already been applied to its left argument, i.e., it is an inner lambda of a curried binary function. We want to compute the inner lambda of F composed with G.

(defun curried-compose (f g)
  (lambda (right)  ; return an inner lambda
    (multiple-value-bind (vall valr) (funcall g right)
      (funcall (funcall f vall) valr))))

Here is how it works. We return an inner lambda that accepts the right argument of the composition. We call the inner lambda G on this argument and collect the two values it returns. We then pass the two values one at a time to F.

If we did this right, we should end up with these identities:

(curried-compose #’F (curried-values left))
    ==> (F left)

and

(curried-compose #’curried-values G-inner)
    ==> G-inner

furthermore, curried-compose should be associative:

(curried-compose (curried-compose F G) H)
    ==> (curried-compose F (curried-compose G H))

Let’s try it out.

(defun curried-cons (left)
  (lambda (right)
    (cons left right)))

;; Check first identity.
(let ((x (curried-compose #’curried-cons (curried-values ’car))))
  (assert (equal (funcall x ’cdr) ’(car . cdr))))

;; Check second identity.
(let ((x (curried-compose #’curried-values (curried-cons ’car))))
  (assert (equal (funcall x ’cdr) ’(car . cdr))))

;; Check associativity.
(defun curried-sum-difference (left)
  (lambda (right)
    (values (+ left right) (- left right))))

(let ((x (curried-compose
           #’curried-cons
           (curried-compose
             #’curried-sum-difference
             (curried-values 7))))
      (y (curried-compose
           (lambda (v)
             (curried-compose #’curried-cons
               (curried-sum-difference v)))
            (curried-values 7))))
  (assert (equal (funcall x 3) (funcall y 3))))

If the binary function is closed over its right argument, then when you curry it, the inner lambda will be closed over its argument. We should be able to compose various inner lambdas to make a pipeline. The pipeline will daisy chain the right argument from one curried function in the pipeline to the next.

Pipelines like this have two properties we want to exploit: first, they force an order of execution on the functions in the pipeline. The earlier stages in the pipeline have to produce an answer before the later stages consume it, even if all the functions in the pipeline are lazy pure functional. Second, we can abstract out the argument we are daisy chaining through the stages.

Let’s use this in a real problem. Our task is to build a binary tree where each node has a serial number. The serial number will be our right hand argument which is daisy chained through the computation.

(defun next-sn (sn)
   (values sn (1+ sn)))

(defun curried-make-node (depth children)
  (curried-compose
    (lambda (sn)
      (curried-values (cons (cons depth sn) children)))
    #’next-sn))

(defun curried-make-tree (depth)
  (if (zerop depth)
      (curried-make-node depth nil)
      (curried-compose
        (lambda (left-branch)
          (curried-compose
            (lambda (right-branch)
              (curried-make-node depth (list left-branch right-branch)))
            (curried-make-tree (1- depth))))
        (curried-make-tree (1- depth)))))

curried-make-tree returns a curried function. It hasn’t built a tree, but built a function that builds the tree once it gets the serial number passed in.

Notice these two things: we have no global state or assignment, but we get a serial number that is incremented for each node. The serial number is passed around as the curried right hand argument and returned as the right hand value. Notice, too, that curried-make-tree has no mention of the serial number. It is a hidden variable.

curried-compose is reminiscent of a let expression, so we can create some analagous syntactic sugar:

(defmacro LetM (((var val) &rest more) &body body)
  ‘(curried-compose
     (lambda (,var)
       ,@(if more
            ‘((LetM ,more ,@body))
            body))
     ,val))

(defun curried-make-tree (depth)
  (if (zerop depth)
      (curried-make-node depth nil)
      (LetM ((left-branch (curried-make-tree (1- depth)))
             (right-branch (curried-make-tree (1- depth))))
        (curried-make-node depth (list left-branch right-branch)))))

There is a special name for the inner lambda that is expecting the right hand argument. It is a “monad”. curried-values is the monad “unit”, and curried-compose is the monad “bind”.

As I mentioned in a previous post, it makes more sense to use this approach in a lazy functional language. The chain of curried functions force sequential evaluation because the output of each curried function must be computed before it becomes the input of the next curried function. The right hand argument that is threaded through the curried calls can be used as a “store” that is (functionally) updated by each curried function before passing it along, giving you the illusion of mutable storage.

A monad is how you embed an imperative language within a pure, lazy, functional language. But in a mixed language like Lisp, you can force sequential evaluation by progn and the store is actually mutable, so there is less of a need for monads. Nonetheless, they provide a way to “hide” a variable that is threaded through the curried functions and that can come in handy. For example, if you are doing I/O, you can hide the stream variable so you don’t have to pass it along to every intermediate function.

Thursday, December 26, 2024

Dynamic Variables in Go

Dynamic binding is an evaluation rule for free variables. The rule is that the value of a free variable in a lambda expression is the value bound in the nearest dynamically enlosing scope. This is in contrast to static binding, where the value is the value bound in the nearest lexically enclosing scope. A naive interpreter that saves variable bindings in a stack frame will usually end up implementing dynamic binding.

In general, static binding is preferred because it is easier to reason about. However, dynamic binding is sometimes very useful in handling context. Lisp provides both static and dynamic binding, with the default being static binding.

Golang provides lexical binding, but not dynamic binding. But you can mimic the effect of dynamic binding by saving the variable's value, assigning it a new value, then restoring the old value in a defer statement. But this trick won't work in a multi-threaded environment. Each thread has to manage its own value of the variable that is separate from the values in other threads.

But what if the implementation doesn't tightly bind continuations to threads? What if the implementation uses a thread pool, and the same thread is used for different goroutines at different times? In that case, the trick of saving and restoring the value of a variable won't work. The value of the variable will be shared among different goroutines, and the value will be unpredictable. To make things work, your dynamic variables should be continuation-local.

But golang doesn't provide continuation-local variables, and it seems to go out of its way to keep code from inspecting the current continuation. There is a trick that you can use.

We need a way to allocate synthetic continuation IDs and bind them to the chain of pending calls. We also need a way that a subroutine can inspect the current chain of pending calls and determine the synthetic continuation ID.

Allocating a synthetic continuation ID is easy enough by maintaining a bitmap of free continuation IDs. When we need a continuation ID, we grab a mutex and search the bitmap for an unused continuation ID. We set the bit to allocate the ID and free the mutex. To free the continuation ID, we reset the bit. Any time you want to use dynamic variables, you need to allocate a synthetic continuation ID early on. This will be used as an index into a value table for each dynamic variable.

var (
	continuationIdInUse [MaxContinuations]bool = [MaxContinuations]bool{}
	continuationIdMutex sync.Mutex       = sync.Mutex{}
)

func AllocateContinuationId[T any](receiver func(tid uint8) T) T {
	continuationIdMutex.Lock()

	for tid, inUse := range continuationIdInUse {
		if (tid > 0) && !inUse {
			continuationIdInUse[tid] = true
			continuationIdMutex.Unlock()
			return DeallocatingContinuationId(uint8(tid), receiver)
		}
	}

	panic("No available continuation ids")
}

func DeallocateContinuationId(tid uint8) {
	continuationIdInUse[tid] = false
}

func DeallocatingContinuationId[T any](tid uint8, receiver func(tid uint8) T) T {
	defer DeallocateContinuationId(tid)
	return receiver(tid)
}

Golang doesn’t offer much in the way of stack introspection or manipulation, but you can easily get a backtrace of the names of the pending function calls. So we’ll encode the synthetic continuation ID as a nested set of function calls. When we need to read the current continuation ID, we get a backtrace of the stack and look for the special nested set of function calls and decode them.

func Mark0[T any](count int, value uint8, thunk func() T) T {
	if count == 0 {
		return thunk()
	} else if value%2 == 0 {
		return Mark0(count-1, value/2, thunk)
	} else {
		return Mark1(count-1, (value-1)/2, thunk)
	}
}

func Mark1[T any](count int, value uint8, thunk func() T) T {
	if count == 0 {
		return thunk()
	} else if value%2 == 0 {
		return Mark0(count-1, value/2, thunk)
	} else {
		return Mark1(count-1, (value-1)/2, thunk)
	}
}

func MarkByte[T any](value uint8, thunk func() T) T {
	if value%2 == 0 {
		return Mark0(7, value/2, thunk)
	} else {
		return Mark1(7, (value-1)/2, thunk)
	}
}

When we call MarkByte on a byte, we’ll make a series of eight nested function calls to Mark0 or Mark1 depending on the bits in the byte. We can use the golang backtrace mechanism to find the nested function calls and determine what the byte was:

func ReadMark() int {
	pc := make([]uintptr, 256)
	n := runtime.Callers(0, pc)

	frames := runtime.CallersFrames(pc[:n])
	value := 0
	for {
		frame, more := frames.Next()
		if !more {
			break
		}

		if strings.Contains(frame.Function, "Mark0") {
			value *= 2
		} else if strings.Contains(frame.Function, "Mark1") {
			value = value*2 + 1
		}
	}
	return value
}

Now when we spin off a goroutine that will use dynamic variables, we must be careful to call WithThread some time early on in the goroutine:

func MakeMyGoroutineEntry () func () {
        return func () {
                _ = WithThread(func () int {
                        // There are 8 pending calls to Mark[0,1]
                        // And we can use ReadMark() to get the
                        // represented value
                        
                        // ... body of callback ...
                   })
         }
}

  // ... some code somewhere ...
  myGoroutine := MakeMyGoroutineEntry()
  go myGoroutine()

Now we can use the continuation ID to index into a table of values:

type GlVal[T any] struct {
	values [MaxThreads]T
}

func NewGlVal[T any]() *GlVal[T] {
	return &GlVal[T]{}
}

We read the goroutine local variable by indexing into the table:

func GlValGet[T any](g *GlVal[T]) T {
	return g.values[ReadMark()]
}

We write the golang local variable by writing to the table. We return the old value so that we can restore it later. Note, no mutex is needed because the value is thread-local:

func GlValSet[T any](g *GlVal[T], value T) T {
	mark := ReadMark()
	oldValue := g.values[mark]
	g.values[mark] = value
	return oldValue
}

When we want to dynamically bind the variable, we save the old value to a frame local variable, set the new value, and defer a function to restore the old value:

func GlValBind[T, U any](g *GlVal[T], value T, thunk func() U) U {
	oldValue := GlValSet(g, value)
	defer GlValSet(g, oldValue)
	return thunk()
}

This technique seems reasonably robust despite using the debugging support for its implementation. It will fail if they ever make golang tail-recursive, but I doubt that will happen any time soon.

I Don’t Use Monads

I consider myself a “mostly functional” programmer. I think about programs in terms of data flow and transformations and mappings from input to output. I avoid side effects and mutable state where it isn’t necessary. But I’m not a purist. I’m happy to setf an instance slot when it makes sense.

Monads are the darling of the functional programming community. To hear it, they are the key to understanding the universe. They provide a way to encapsulate side effects and mutable state in a pure functional way.

But I don’t often use them.

There are a few reasons for this. I don’t work in Haskell, where monads are a ubiquitous part of the language. If you use Haskell, you’re going to use monads. But what about the languags that I do use? Monads of course “work” in any language that supports higher-order functions, but they are not always idiomatic or the obvious way to accomplish a task.

The primary use of monads is to mimic the look and feel of an imperative language, with its sequential execution, side effects and mutable state, in a language that is purely functional. Monads are a way to implement progn and setf in Haskell. But Lisp already has progn and setf.

And an imperative programming style is largely inferior to a functional programming style. I prefer to not think imperatively and write imperative code. I've seen many Haskell programs that used monads just so they could use do notation and write code that looked like C. That seems like a step backwards.

Monads are complicated. They essentially come down to functional composition of curried functions hidden behind some syntactic sugar. So long as you don’t have to look under the hood, they are reasonably easy to use. But when you do have to look under the hood, you’ll find a maze of twisty, turny, higher-order procedures. In Lisp, a well placed setf can cut through this functional Gordian knot.

Monday, December 16, 2024

Does CONS take its arguments in the wrong order?

Object-oriented languages often come with a “collections” library that provides a variety of data structures for storing collections of objects. Methods on collections usually pass the collection as the first argument for two reasons: (1) the collection is the primary object being operated on, and (2) in many languages, method dispatch only works on the first argument.

In Lisp, however, lists are a collection type that is derived from a more primitive data type, the cons cell. Lists are not fully abstracted from cons cells and many list operations are actually defined as operations on cons cells. Rather than the list argument always coming first, there is little rhyme or reason to the order of arguments in list operations. This becomes evident when you try to use higher-order operations on lists.

cons constructs a cons cell, with two fields, a car and a cdr. The car is traditionally written to the left of the cdr, and cons takes the car as the first argument.

In Lisp, there is no special constructor for lists. The user directly constructs the underlying cons cell and simply “declares in his head” that the result is a list. Hence when we are extending a collection with an item, the collection is the second argument to cons rather than the first. In a “modern” curly-brace, object-oriented language, we would extend mylist by writing mylist.cons(item)

Here’s where it makes a difference:

The fold-left function iterates down a list and calls a combining function on each element of the list. The first argument to the combining function is the accumulated answer, the second is the element being accumulated. If we want to accumulate into a list, the accumulated list is passed first and the new item is passed second. This means that cons is not suitable as the combining function in fold-left because the arguments come in the wrong order:

(defun bad-reverse (list)
  (fold-left #’cons ’() list)) ;; does not reverse a list

(defun xcons (cdr car)  ;; takes original list first
  (cons car cdr))

(defun reverse (list)
  (fold-left #’xcons ’() list)) ;; correcty reverses a list

Many list operations, remove and adjoin for example, take their arguments in an inconvenient order for fold-left.

Thursday, December 12, 2024

How to Blow an Interview

I recently had to interview a candidate. He had passed earlier interviews, had done well, and was in the final running. My only task was to determine his coding ability.

I checked resume, and saw that he listed Python among his computer language skills, so I decided to pitch him a Python program. I wrote a this simple skeleton of a Tic-Tac-Toe program:

class TicTacToeBoard:
    def __init__(self):
        pass

    def __str__(self):
        pass

    def make_move(self, row, col, player):
        pass

    def is_winner(self, player) -> bool:
        return False

    def is_full(self) -> bool:
        return False

if __name__ == '__main__':
    board = TicTacToeBoard()
    print(board)
    board.make_move(0, 0, 'X')
    board.make_move(0, 1, 'O')
    board.make_move(1, 1, 'X')
    board.make_move(1, 0, 'O')
    board.make_move(2, 2, 'X')
    print(board)
    print(board.is_winner('X'))
    print(board.is_winner('O'))
    print(board.is_full())

I invited him to “go to town” on the code. I didn’t care what he implemented or how he implemented it, I just wanted to watch him code.

Alas, he didn’t. He studied the code for a few minutes, moved the cursor around, and then admitted that it had been a while since he had written any significant Python code and he was a bit rusty. He was more used to Java.

That’s fine. I asked him to write the code in Java. I apologized for not having a Java skeleton, but I was sure he could write something.

He didn’t get very far. He wrote a class declaration, but was stumped on any method definitions. He admitted that he was pretty rusty in Java, too, and wanted to switch back to Python.

I didn’t care, I just wanted to see some code. He asked if the job was going to require a lot of coding. I said that it would definitely require some coding as part of the job was analyzing legacy code and identifying and fixing security bugs.

After half an hour, he still hadn’t written even a single line of working code in any language. I felt bad for him and asked him about the other experiences on his resume. They looked good, but a little digging showed that his experience was a bit shallow. I thanked him for his time and told him that the recruiter would be in touch.

I gave him a solid “no”. He just would not have been a good fit. The job required coding skills that he simply did not have.

I don’t understand it. I would have thought that the ability to write some code was a basic requirement for any job involving computers. But I guess not.


Nutanix is hiring. It’s a pretty good gig, so if you see something that interests you, apply. But be prepared to write some code.

Saturday, November 2, 2024

Don't Try to Program in Lisp

A comment on my previous post said,

The most difficult thing when coming to a different language is to leave the other language behind. The kind of friction experienced here is common when transliterating ideas from one language to another. Go (in this case) is telling you it just doesn’t like to work like this.
Try writing simple Go, instead of reaching for Lisp idioms. Then find the ways that work for Go to express the concepts you find.

That's not at all how I approach programming.

A friend of mine once paid me a high compliment. He said, “Even your C code looks like Lisp.”

When I write code, I don't think in terms of the language I'm using, I think in terms of the problem I'm solving. I'm a mostly functional programmer, so I like to think in terms of functions and abstractions. I mostly reason about my code informally, but I draw upon the formal framework of Lambda Calculus. Lambda Calculus is a simple, but powerful (and universal) model of computation.

Programming therefore becomes a matter of expressing the solution to a problem with the syntax and idioms of the language I'm using. Lisp was inspired by Lambda Calculus, so there is little friction in expressing computations in Lisp. Lisp is extensible and customizable, so I can add new syntax and idioms as desired.

Other languages are less accommodating. Some computations are not easily expressable in the syntax of the language, or the semantics of the language are quirky and inconsistent. Essentially, every general purpose fourth generation programming language can be viewed as a poorly-specified, half-assed, incomplete, bug-ridden implementation of half of Common Lisp. The friction comes from working around the limitations of the language.

Wednesday, October 16, 2024

Lisp vs. golang

It's no secret that I'm an aficionado of Lisp. It's my go to language, especially when I don't know what I'm doing. I call it research and prototyping, but it's really just playing around until something works.

We had a need for some auditing of some of our databases at work. They ought to agree with each other and with what GitHub and CircleCI think. It took a couple of weeks part time to prototype a solution in Common Lisp. It showed that the databases were in 99% agreement and found the few points of disagreement and anomalies that we ought to fix or look out for.

I want to integrate this information into a dashboard on one of our tools. I prototyped this by spinning up a Common Lisp microservice that returns the information in JSON format.

But management prefers that new services are written in golang. It would be easier for me to rewrite the service in golang than to try to persuade others to use Common Lisp. It also gives me the opportunity to compare the two languages head to head on a real world problem.

No, this is not a fair comparison. When I wrote the Lisp code I was exploring the problem space and prototyping. I'm much more experienced with Lisp than with golang. The golang version has the advantage that I know what I want to do and how to do it. In theory, I can just translate the Common Lisp code into golang. But then again, this is a “second system” which is not a prototype and has slightly larger scope and fuller requirements. So this cannot be a true head to head comparison.

The first point of comparison is macros (or lack thereof). I generally don't use a lot of macros in Common Lisp, but they come in handy when I do use them. One macro I wrote is called audit-step, which you can wrap around any expresion and it prints out a message before and after the expression is evaluated. The steps are numbered in sequence, and nested steps get nested numbers (like step 2.3.1). If you wrap the major function bodies with this macro, you get a nice trace of the call sequence in the log.

Golang doesn't have macros, but it has first class functions. It's easy enough to write a function that takes a function as an argument and wraps it to output the trace messages. In fact, the macro version in Common Lisp just rewrites the form into such a function call. But the macro version hides a level of indentation and a lambda. In golang, my major functions all start with

func MajorFunction (args) int {
        return AuditStep("MajorFunction", "aux message", func() int {
                // body of MajorFunction
                // Actual code goes here.
        })    
}

The bodies of all my major functions are indented by 16 spaces, which is a little much.

I like higher order functions. I can write one higher order function and parameterize it with functions that handle the specific cases. In my auditing code, one such workhorse function is called collate. It takes a list of objects and creates a table that maps values to all objects in the list that contain that value. To give an example, imaging you have a list of objects that all have a field called foo. The foo field is a string. The collate function can return a table that maps strings to all objects that have that string in the foo field.

collate is very general. It takes a list of objects and four keyword arguments. The :key argument is a function that extracts the value to collate on. The :test argument is a function that compares two keys (it defaults to eql if not specified). The :merger argument is a function to add the mapped object to its appropriate collection in the table (it defaults to adjoin). The :default argument specifies the initial value of a collection in the table (it defaults to nil).

The :merger function is the most interesting. It takes the key and the object and the current value of the table at that key. It returns the new value of the table at that key. The default merger function is adjoin, which adds the object to the collection at the key if it is not already there. But you can specify a different merger function. For example, if you want to count the number of objects at each key, you can specify a merger function that increments a counter.

The functional arguments to the collate function are often the results of other higher order functions. For example, the :key argument is often the result of composing selector functions. The :merger argument is often the result of composing a binary merge function with a unary transformer function. The transformer function is often the result of composing a number of primitive selectors and transformers.

In Common Lisp, it is quite easy to write these higher order functions. We can compose two unary functions with the compose2 function:

(defun compose2 (f g)
  (lambda (x) (funcall f (funcall g x)))

and then compose as many functions as we like by fold-left of compose2 starting with the identity function:

(defun compose (&rest fs)
  (fold-left #'compose2 #'identity fs))

We can compose a binary function with a unary function in three ways: we can pipe the output of the binary function into the unary function, or we can pipe the output of the unary function into one or the other of the inputs of the binary function.

(defun binary-compose-output (f g)
  (lambda (x y) (funcall f (funcall g x y))))

(defun binary-compose-left (f g)
  (lambda (x y) (funcall f (funcall g x) y)))

(defun binary-compose-right (f g)
  (lambda (x y) (funcall f x (funcall g y))))

The collate function can now assume that a lot of the work is done by the :key and :merger functions that are passed in. It simply builds a hash table and fills it:

(defun collate (item &key (key #'identity) (test #'eql) (merger (merge-adjoin #'eql)) (default nil))
  (let ((table (make-hash-table :test test)))
    (dolist (item items table)
      (let ((k (funcall key item)))
        (setf (gethash k table) (funcall merger (gethash k table default) item))))))

(defun merge-adjoin (test)
  (lambda (collection item)
    (adjoin item collection :test test)))

So suppose, for example, that we have a list of records. Each record is a three element list. The third element is a struct that contains a string. We want a table mapping strings to the two element lists you get when you strip out the struct. This is easily done with collate:

(collate records
  :key (compose #'get-string #'third)
  :test #'equal      ; or #'string= if you prefer
  :merger (binary-compose-right (merge-adjoin #'equal) #'butlast))

The audit code reads lists of records from the database and from GitHub and from CircleCI and uses collate to build hash tables we can use to quickly walk and validate the data.

Translating this into golang isn't quite so easy. Golang has first class function, true, but golang is a statically typed language. This causes two problems. First, the signature of the higher order functions includes the types of the arguments and the return value. This means you cannot just slap on the lambda symbol, you have to annotate each argument and the return value. This is far more verbose. Second, higher order functions map onto parameterized (generic) types. Generic type systems come with their own little constraint language so that the computer can figure out what concrete types can correctly match the generic types. This makes higher order functions fairly unweildy.

Consider compose2. The functions f and g each have an input and output type, but the output type of g is the input type of f so only three types are involved

func Compose2[T any, U any, V any](f func(U) V, g func(T) U) func(T) V {
	return func(x T) V {
		return f(g(x))
	}
}

If want to compose three functions, we can write this:

func Compose3[T any, U any, V any, W any](f func(V) W, g func(U) V, h func(T) U) func(T) W {
	return func(x T) W {
		return f(g(h(x)))
	}
}
The generic type specifiers take up as much space as the code itself.

I don't see a way to write an n-ary compose function. It would have to be dynamically parameterized by the intermediate types of all the functions it was composing.

For the collate function, we can write this:

func Collate[R any, K comparable, V any](
	list *Cons[R],
	keyfunc func(R) K,
	merger func(V, R) V,
	defaultValue V) map[K]V {
	answer := make(map[K]V)
	for list != nil {
		key := keyfunc(list.Car)
		probe, ok := answer[key]
		if !ok {
			probe = defaultValue
		}
		answer[key] = merger(probe, list.Car)
		list = list.Cdr
	}
	return answer
}

We have three types to parameterize over: the type of the list elements (i.e. the record type) R, the type of the key K, and the type of the value V. The key type is needs to be constrained to be a valid key in a map, so we use the comparable constraint. Now that we have the types, we can annotate the arguments and return value. The list we are collating is a list of R elements. The key function takes an R and returns a K. The merger takes an existing value of type V and the record of type R and returns a new value of type V.

The magic of type inference means that I do not have to annotate all the variables in the body of the function, but the compiler cannot read my mind and infer the types of the arguments and return value. Golang forces you to think about the types of arguments and return values at every step of the way. Yes, one should be aware of what types are being passed around, but it is a burden to have to formally specify them at every step. I could write the Common Lisp code without worrying too much about types. Of couse the types would have to be consistent at runtime, but I could write the code just by considering what was connected to what. In golang, the types are in your face at every function definition. You not only have to think about what is connected to what, you have to think about what sort of thing is passed through the connection.

I'm sure that many would argue that type safety is worth the trouble of annotation. I don't want to argue that it isn't. But the type system is cumbersome, awkward, and unweildy, especially when you are trying to write higher order functions.

It is taking me longer to write the golang version of the audit service than it did to write the Common Lisp version. There are several reasons. First, I am more experienced with Common Lisp than golang, so the right Common Lisp idioms just come to mind. I have to look up many of the golang idioms. Second, the golang code is trying to do more than the Common Lisp code. But third, golang itself introduces more friction than Common Lisp. Programs have to do more than express the algorithm, they have to satisfy the type system.

There are more points of comparison between the two languages. When I get frustrated enough, I'll probably write another post.

Wednesday, July 31, 2024

Continuation passing style resource management

One good use of continuation passing style is to manage dynamic resources. The resource allocation function is written in continuation passing style and it takes a callback that it invokes once it has allocated and initialized the resource. When the callback exits, the resource is uninitialized and deallocated.

(defun call-with-resource (receiver)
  (let ((resource nil))
    (unwind-protect
        (progn
          (setq resource (allocate-resource))
          (funcall receiver resource))
      (when resource
        (deallocate-resource resource)))))

  ;; example usage:
  (call-with-resource
    (lambda (res)
      (do-something-with res)))

;;; In Lisp, we would provide a convenient WITH- macro
(defmacro with-resource ((resource) &body body)
  ‘(CALL-WITH-RESOURCE (LAMBDA (,resource) ,@body)))

  ;; example usage:
  (with-resource (res)
    (do-something-with res))

This pattern of usage separates and abstracts the resource usage from the resource management. Notice how the unwind-protect is hidden inside call-with-resource so that the user of the resource doesn’t have to remember to deallocate the resource.

The with-resource macro is idiomatic to Lisp. You obviously can’t provide such a macro in a language without macros, but you can still provide the call-with-resource function.

Continuation passing style for resource management can be used in other languages, but it often requires some hairier syntax. Because call-with-resource takes a callback argument, it is actually a higher-order function. The syntax for passing higher-order functions in many languages is quite cumbersome. The return value of the callback becomes the return value of call-with-resource, so the return type of the callback must be compatible with the return type of the function. (Hence the type of call-with-resource is actually parameterized on the return value of the callback.) Languages without sophisticated type inference may balk at this.

Another advantage of the functional call-with-resource pattern is that you can dynamically select the resource allocator. Here is an example. I want to resolve git hashes against a git repository. The git repository is large, so I don’t want to clone it unless I have to. So I write two resource allocators: CallCloningGitRepository, which takes the URL of the repository to clone, and CallOpeningGitRepository which takes the pathname of an already cloned repository. The "cloning" allocator will clone the repository to a temporary directory and delete the repository when it is done. The "opening" allocator will open the repository and close it when it is done. The callback that will be invoked won’t care which allocator was used.

Here is what this looks like in golang:

// Invoke receiver with a temporary directory, removing the directory when receiver returns.
func CallWithTemporaryDirectory(dir string, pattern string, receiver func(dir string) any) any {
	dir, err := os.MkdirTemp(dir, pattern)
	CheckErr(err)
	defer os.RemoveAll(dir)

	return receiver(dir)
}

// Invoke receiver with open git repository.
func CallOpeningGitRepository(repodir string, receiver func(string, *git.Repository) any) any {
	repo, err := git.PlainOpen(repodir)
	if err != nil {
		log.Fatal(err)
	}
	return receiver(repodir, repo)
}

// Invoke receiver with a cloned git repository, removing the repository when receiver returns.
func CallCloningGitRepository(dir string, pattern string, url string, receiver func(tmpdir string, repo *git.Repository) any) any {
	if url == "" {
		return nil
	}
	return CallWithTemporaryDirectory(
		dir,
		pattern,
		func(tempdir string) any {
			log.Print("Cloning " + url + " into " + tempdir)
			repo, err := git.PlainClone(tempdir, true, &git.CloneOptions{
				Auth: &gitHttp.BasicAuth{
					Username: username,
					Password: password,
				},
				URL:      url,
				Progress: os.Stdout,
			})
			CheckErr(err)
			log.Print("Cloned.")
			return receiver(tempdir, repo)
		})
}

You specify a repository either with a URL or a pathname. We select the appropriate resource allocator based on whether the specifier begins with "https".

func RepositoryGetter (specifier string) func (receiver func(_ string, repo *git.Repository) any) any {
	if strings.EqualFold(specifier[0:5], "https") {
                return GetRemoteGitRepository (specifier)
	} else {
                return GetLocalGitRepository (specifier)
	}
}

func GetRemoteGitRepository(url string) func(receiver func(_ string, repo *git.Repository) any) any {
	return func(receiver func(_ string, repo *git.Repository) any) any {
		return CallCloningGitRepository("", "git", url, receiver)
	}
}

func GetLocalGitRepository(repodir string) func(receiver func(_ string, repo *git.Repository) any) any {
	return func(receiver func(_ string, repo *git.Repository) any) any {
		return CallOpeningGitRepository(repodir, receiver)
	}
}

To open a repository, we call RepositoryGetter(specifier) to get a getRepository function. Then we invoke the computed getRepository function on a receiver callback that accepts the local pathname and the repository:

  getRepository := RepositoryGetter(specifier)

  return getRepository(
         func (_ string, repo *git.Repository) any {
                 // resolve git hashes against the repository
                 ....
                 return nil
         })

If given a URL, this code will clone the repo into a temporary directory and open the cloned repo. If given a pathname, it will just open the repo at the pathname. It runs the callback and does the necessary cleanup when the callback returns.

The biggest point of confusion in this code (at least to me) are the type specifiers of the functions that manipulate the resource allocators. Static types don’t seem to mix well with continuation passing style.

Wednesday, July 10, 2024

Monkeys vs. Shakespeare

Google's golang was designed for mediocre programmers that aren't “capable of understanding a brilliant language”. It omits features that are hard to understand or hard to use, like conditional expressions, macros, exceptions, and object systems.

Lisp, on the other hand, was originally designed for smart programmers in order to research artificial intelligence. It contains language constructs like conditional expressions, a powerful macro system, a highly customizable error handling system, and a sophisticated object system.

A million monkeys typing on a million keyboards will eventually produce the works of Shakespeare. Lisp is a language for Shakespeares, while golang is a language for monkeys.

What is the fundamental problem we are solving? If the problem is simply an insufficient amount of software, then we need to write as much software as possible as rapidly as possible. Hire more monkeys. If the problem is gainfully employing all the monkeys quickly, give them crude tools that are easy to use. But if the problem is lack of quality software, then we need the tools to write the best software possible. Hire more Shakespeares and give them powerful tools.

Hiring monkeys is easier and cheaper than hiring Shakespeares. So why not just keep hiring monkeys until you have a Shakespeare equivalent? Experience shows how well this works. Just think of all the projects that were brought in on time and under budget when they doubled the number of monkeys. Just think of any project that was brought in on time and under budget when they doubled the number of monkeys. Surely there must be one?

Thursday, July 4, 2024

A YouTuber's First Look at Lisp

There is a guy who is making a series of videos where he takes a “first look” at various programming languages. I watched his “first look” at Lisp. Now he wasn’t going into it completely cold — he had looked at Scheme previously and had seen some Lisp — but it was still an early experience for him.

He gave it a good go. He didn’t have a pathological hatred for parenthesis and seemed willing to give it a fair shake. He downloaded SBCL and started playing with it.

Since he only had allocated an hour to it, he didn't cover much. But he encountered a few pitfalls that Lisp newbies often fall into. I found myself wanting to talk back at the screen and point out where he was going wrong when simple typos were causing errors.

For example, he was trying to format some output, but he forgot the closing double quote on the format string. The reader kept waiting for him to finish the string and simply gobbled up the format args and closing parens as part of the string. He was confused as to why nothing was happening. When he finally typed Control-C, he was faced with a spew of stack trace and a debugger prompt that was of no help to him.

A second problem he encountered was when he typed (print (first (*mylist*))). Any experienced Lisp hacker will obviously see the problem right away: *mylist* is not a function. But the error message was buried in a 30 level deep stack trace that wound back through the reader and REPL and completely obscured the problem.

The Lisp debugger is amazing, but it can be a bit overwhelming to a newbie. When I was TAing Lisp classes, I would often find that students were intimidated by the debugger. They felt that while they were in the debugger there was some sort of impending doom hanging over them and that they had to exit and get back to the normal REPL as quickly as possible. I think if I were teaching a Lisp course now, I would start off by deliberately causing errors and then showing students how to use the debugger to find the problem and recover from it.

PLT Scheme has an interesting “beginner” mode that disables some of the more advanced features of Lisp. Many typos in Lisp cause errors not because they are invalid, but because they mean something advanced, but unintended. For example, the beginner mode disables first-class functions, so accidentally putting in extra parenthesis becomes a syntax error instead of an attempt to funcall something inappropriate. Perhaps a “training wheels” approach would work with Lisp beginners.

But SBCL generates pretty good error messages. It was a case of TLDR. The reviewer just didn't take the time to read and understand what the error message said. He was too busy trying to figure out how to get back to the REPL.

All in all, the reviewer gave it a good go and came away with a positive impression of Lisp. He also took he time to research the language and provided some examples of where Lisp is used today. The video is at https://www.youtube.com/watch?v=nVhzHu2zSEQ if you are interested.

Wednesday, June 26, 2024

Goto not that harmful

I use goto all the time. When I have a loop, I goto the top of the loop after each iteration. When a function delegates to another function, I just goto the other function.

Gotos that just transfer control have the problem that the context is implicit. It isn’t obvious from the code what parts of the context are expected to persist across the control transfer. But if you explicitly pass arguments along with your control transfer, then you can see exactly what is carried across the control transfer.

A tail call isn’t “optimized” to a goto, it is a goto. It is a goto that passes arguments. Gotos that pass arguments aren’t harmful.

Monday, June 24, 2024

Embrace the Suck

The key point here is our programmers are Googlers, they’re not researchers. They’re typically fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt.
— Rob Pike

How sad. Google used to hire top-notch programmers. Now it appears that they have hired a bunch of mediocre programmers who can't even learn how to properly use exceptions or ternary conditionals. Programmers that need “training wheels” on their C.

And how insulting to Googlers to be told that they are not capable of understanding a brilliant language. To be seen as merely a fungible resource to be “used” to build software. I'm sure some of them are capable of understanding a brilliant language. I'm sure that some are capable of learning how to use Haskell or OCaml or Clojure (or F# or Rust or Erlang or Scala) whatever language is best for the task.

Does success come to those that strive for excellence or those that embrace mediocrity?

Sunday, June 16, 2024

Decode a Float (Solution)

We can multiply or divide a floating point number by 2 without changing the bits in the mantissa. So we can rescale the number to be in the range [1, 2) by repeatedly multiplying or dividing by 2. The leftmost bit of the mantissa is always 1, but next bit determines whether the number is in the top half of the range or the bottom half. So if the number is equal to or greater than 1.5, the next bit is 1, otherwise it is 0.

If the number is in the range [1, 2), we can subtract 1 from it. The remaining bits will be shifted left until the leftmost bit is 1. If the number is greater than or equal to 1.5, then subtracting 1 will shift the bits left by one. But if the number is in [1, 1.5), we won’t know how many zero bits will be shifted out when we subtract 1. What we’ll do is add .5 to the number to turn on the next bit, then subtract 1 to shift the bits left by one.

Here is the code in Common Lisp:

(defun float->bits (float)
  (cons 1 (read-bits float 0)))

(defun read-bits (float count)
  (cond ((>= count 52) nil)
        ((> float 2.0d0) (read-bits (/ float 2.0d0) count))
        ((< float 1.0d0) (read-bits (* float 2.0d0) count))
        ((>= float 1.5d0) (cons 1 (read-bits (- float 1.0d0) (1+ count))))
        (t (cons 0 (read-bits (- (+ float .5d0) 1.0d0) (1+ count))))))

Note that all these operations are exact and do not cause round off.

Saturday, June 15, 2024

Decode a Float

The leftmost bit of any positive binary number is always 1. So if you were to left-justify a positive binary number, the top bit would always be 1. If the top bit is always 1, there is no need to implement it. Floating point numbers use this trick.

You can determine the bits of an integer using only arithmetic operations by repeatedly dividing by two and collecting the remainders. Today’s puzzle is to determine the bits of a floating point number using only arithmetic operations (no decode-float or integer-decode-float).

Thursday, June 6, 2024

D-day, 80 years ago today

More than 150,000 troops, 5,000 ships, 13,000 aircraft in the largest amphibious assault in history.

Wednesday, June 5, 2024

Multithreading and Immutable Data

I was amusing myself by looking at Lisp tutorials. They used the idea of a Tic-Tac-Toe service as a motivating example. You’d be able to play Tic-Tac-Toe against the computer or another opponent.

My immediate thought went to the issue of multithreading. If you were going to serve hundreds of people at once, you’d need to have a multi-threaded service. Multi-threaded code is hard to write and debug, and it is much better if you have a plan before you start than if you try to retrofit it later (that trick never works).

The magic bullet for multi-threading is immutable data. Immutable data is inherently thread-safe. It doesn’t need synchronization or locks. If all your data are immutable, you can pretty much ignore multi-threading issues and your code will just work.

Using a 2D array to represent a Tic-Tac-Toe board is the obvious thing that first comes to mind, but not only are arrays mutable, they virtually require mutation to be of any use. The Lisp tutorials I was looking at all used arrays to represent the board, none of them locked the board or used atomic operations to update it, and all had the potential for race conditions if two threads tried to update the board at the same time. Arrays are essentially inherently thread-unsafe.

I thought about alternative representations for the board. Different representations are more or less amenable for writing code that avoids mutation. I came up with a few ideas:

  • Use a 2d array, but copy it before each mutation. This is horribly inefficient, but it is simple.
  • Use a 1d array, again copying it before each mutation. This isn’t much different from the 2d array, but iterating over the cells in the board is simpler.
  • Keep a list of moves. Each move is a pair of player and position. To determine the state of the board, you iterate over the list of moves and apply them in order. This is a bit more complicated than the array representations, but it is inherently immutable. It also has the advantage that you can rewind the board to any prior position.
  • Encode the board as a pair of bitmaps, one for each player.
  • Encode the board as a single bitmap, with each cell represented by two bits.
  • There are only 39 ways to fill out a Tic-Tac-Toe grid, so you could represent the board as an integer.

Each one of these representations has pros and cons. I wrote up some sample code for each representation and I found that the representation had a large influence on the character of the code that used that representation. In other words, there wasn’t a single general Tic-Tac-Toe program that ended up being specialized to each representation, but rather there were six different Tic-Tac-Toe programs each derived from its own idiosyncratic representation.

In conclusion, it is a good idea to plan on using immutable data when you might be working with a multi-threaded system, and it is worth brainstorming several different representations of your immutable data rather than choosing the first one that comes to mind.

Saturday, June 1, 2024

Roll Your Own Syntax

Unlike most languages, Lisp represents its programs as data structures. A Lisp program is a set of nested lists. We can look at a Lisp program as a tree, with each nested list as a node in the tree. The first element of each list indicates the kind of node it is. For instance, a sublist beginning with LET binds local variables, a sublist beginning with IF is a conditional, and so on.

In most languages, it difficult or impossible to add new node types to the syntax tree. The syntax is wired into the language parser and if you even can add new syntax, you have to carefully modify the parser to recognize it. In Lisp, adding new node types is quite easy: you just mention them.

To give an example, suppose you wanted to add a new node to the syntax tree called COMMENT, which would have a string and a subexpression as components. You'd use it like this:

(comment "Avoid fencepost error" (+ x 1))

Here's how you could define the semantics of a COMMENT node in Lisp:

(defmacro comment (string expr)
  expr)

That's it. You can now insert arbitrary COMMENT nodes into your Lisp programs.

Compare that to what you would have to do in a language like Java to add a new kind of node to the syntax tree. You'd have to modify the parser, the lexer, the AST classes, and probably a bunch of other stuff. It's non trivial.

For a more complex example, consider adding transactions to a language. In a language like Java, you'd have to modify the parser to recognize the new syntax, and then you'd have to modify the compiler to generate the correct code. In Lisp, you can just define a new node type:

(defmacro with-transaction (body)
  <complex macro elided>)

And then use it like this:

(with-transaction
  (do-something)
  (do-something-else))

Now obviously you should put some thought into doing this. Adding dozens of random new node types to the language would be a bad idea: readers of the code wouldn't be expecting them. But in some cases, a new node type can be just what is called for to abstract out complexity or boilerplate. Lisp gives you that option.

Tuesday, May 28, 2024

If I Were in Charge

If I were in charge of Python development, here are a few things I would do:

  • Add (optional) tail recursion. This would make it easier to write pure functional code. It would also make it possible to effectively program in continuation passing style. Making tail recursion optional should placate those that feel that stack traces are important for debugging.
  • Add macros. I am thinking of Lisp-like macros that do code transformation, not C-like macros that simply do token substitution. A good macro system would allow advanced users to create new syntactic forms for the language and provide a way to abstract boilerplate.
  • Allow a way to use statements inside expressions, or beef up the expression syntax to have exception expressions, loop expressions, etc. This, too, would make it easier to write pure functional code.
  • Optional end-of-block markers. These would allow you to automatically fix indentation errors and recover indentation when it is lost.
  • Use true lexical scoping. Changing this might break legacy code that depends on the current scoping quirks, though.
  • Use modern interpretation techniques to get the performance up to a more reasonable level. Performance doesn't matter that much, but Python is notably slow.
  • Get rid of the global interpreter lock so that multithreading works better. Probably easier said than done.

I don't believe any of these necessarily involve fundamental changes to the language. They'd just make the language more flexible, though I'm sure many people would disagree with me.

But, perhaps for the best, they're not going to put me in charge.

Sunday, May 26, 2024

Exception Handling for Control Flow

Back when I was taking a Software Engineering course we used a language called CLU. CLU was an early object-oriented language. A feature of CLU was that if you wrote your code correctly, the compiler could enforce completely opaque abstract data types. A good chunk of your grade depended on whether you were able to follow insructions and write your data types so that they were opaque.

CLU did not have polymorphism, but it did have discriminated type unions. You could fake simple polymorphism by using a discriminated type union as the representation of an opaque type. Methods on the opaque type would have to dispatch on the union subtype. Now to implement this correctly, you should write your methods to check the union subtype even if you “know” what it is. The course instructors looked specifically for this and would deduct from your grade if you didn't check.

One subproject in the course was a simple symbolic math system. You could put expressions in at the REPL, substitute values, and differentiate them. The core of the implementation was term data type that represented a node in an expression tree. A term was a type union of numeric constant, a symbolic variable, a unary expression, or a binary expression. Unary and binary expressions recursively contained subterms.

The term data type would therefore need four predicates to determine what kind of term it was. It would also need methods to extract the constant value if it was a constant, extract the variable name if it was symbolic, and extract the operator and subterms if it was a unary or binary expression. The way to use the data type was obvious: you'd have a four-way branch on the kind of term where each branch would immediately call the appropriate selectors. But if you wrote your selectors as you were supposed to, the first thing they should do is check that they are called on the appropriate union subtype. This is a redundant check because you just checked this in the four-way branch. So your code was constantly double checking the data. Furthermore, it has to handle the case should the check fail, even though the check obviously can never fail.

I realized that what I needed was a discrimination function that had four continuations, one for each subtype. The discrimination function would check the subtype and then call the appropriate continuation with the subcomponents of the term. This would eliminate the double checking. The code was still type safe because the discrimination function would only invoke the continuation for the correct subtype.

One way to introduce alternative continuations into the control flow is through exceptions. The discrimination function could raise one of four exceptions, each with the appropriate subcomponents as arguments. The caller would not expect a return value, but would have four catch handlers, one for each subtype. The caller would use the try/except syntax, but it would act like a switch statement.

The TA balked at this use of exceptions, but I appealed to the professor who saw what I was trying to do and approved.

There is some debate about whether exceptions should be used for control flow. Many people think that exceptions should only be used for “exceptional” situations and that it is poor form to use them for normal control flow. I think they are taking too narrow a view. Exceptions are a way to introduce alternative paths of control flow. One can use them to, for instance, handle an exceptional situation by jumping to a handler, but that's not the only way to use them.

Of course you should think hard about whether exceptions are the right way to introduce alternative control flow in your use case. Exception syntax is usually kind of klunky and you may need to rewrite the code to insert the exception handling at the right point. Readers of your code are likely to be surprised or baffled by the use of exceptions for control flow. There is often a significant performance penalty if an exception is thrown. But sometimes it is just the trick.

Saturday, May 25, 2024

ECS in CLOS

Object systems make a natural fit for the game programming domain, but over time people have found that the object-oriented model provided by their favorite language doesn't always fit the use case. So developers have come up with “entity-component systems” (ECS) of varying complexity to fill the gap.

The basic idea is that a game object, called an “entity”, contains or refers to a set of “components” that define its behavior. Entities are too varied to be captured by a single class hierarchy, so we abandon inheritance in favor of composition. An entity that can be displayed on the screen has a “sprite” component, an entity that can move has “position” and “velocity” components, an entity that you can attack has a “hitbox” component and a “health” component. A entity that can attack you has an “attackbox” component. You can play mix and match the components to customize the behavior of the entity. We don't have a hierachy of components because each component can be added more or less independently of the others.

An ECS is an alternative or an augmentation to the built-in object system of a language, but it is an admission that the built-in object is insufficient.

CLOS provides an elegent way to implement an ECS without abandoning CLOS's built-in object system. We define a component as a “mixin” class that can be inherited from using multiple inheritance. We define mixin classes for each component, and then we define entity classes that inherit from the mixin classes. We would define a “sprite” mixin class, a “position” mixin class, etc. So the class of enemy entities would inherit from “sprite”, “position”, “velocity”, “hitbox”, “health”, and “attackbox” classes. A trap entity would inherit from “sprite”, “position”, and “attackbox” classes. A container entity that you could smash open would inherit from “sprite”, “position”, and “hitbox” classes. Etc.

Mixin classes aren't intended to be instantiated on their own, but instead provide slots to the classes that inherits from them. Furthermore, methods can be specialized on mixin classes so that instances derived from the mixin will respond to the method. This allows you to inherit from a mixin providing the particular desired behavior. For example, the “health” mixin class would provide a slot containing the entity's health and a “damage” method to decrease the health. Any entity inheriting from the “health” mixin will react to the damage method. Mixin classes can provide functionality similar to interfaces.

Mixin classes are an exception to Liskov's Substitution Principle, but they are a useful exception. Entities that inherit from a mixin do not have a “is-a” relationship with the mixin, but rather a “has-behavior-of” relationship. An entity inheriting from the “health” mixin is not a “health”, but it has the “health” behavior.

One feature of an ECS is that you can dynamically change the components of an object. For example, once an enemy is defeated, you can remove the “health” and “attackbox” components and add a “corpse” component. Is CLOS you could accomplish this by changing the class of the entity object to a class that doesn't have those mixins.

Of course care must be taken or you will end up with CLOS “soup”. But if you are careful, CLOS can provide a powerful and flexible system for implementing an ECS.

Friday, May 24, 2024

Why I Want Tail Recursion

The reason I want tail recursion is not to write loops (I can do that with a while loop), but to write in continuation passing style if I need to. Continuation passing style allows you to implement any control flow pattern you can imagine, not just the ones intrinsic to the language. You don’t want to use it all the time, but it’s a valuable fallback when you need some ad hoc advanced control flow. Without tail recursion, any non-trivial use of continuation passing style risks blowing the stack.

Iteration is just the special case of linear recursion that doesn’t accumulate state. 99 percent of the time, you know beforehand that you are looping and can use a looping construct, but sometimes you have the general case where whether you loop or not depends on the data at runtime. If you have tail recursion, you just write the code recursively and the tail recursion mechanism will turn it into a loop if it notices that you aren’t accumulating state.

If your language has lambda and tail recursion, it can implement any other control flow that might have been overlooked by the language designer. If it doesn’t, you're limited to the control flow the language designer bothered to implement.

Thursday, May 23, 2024

Rewrite rules

I like rewrite rules. They are simple to understand and easy to implement. If you have a rewrite rule (or a set of them) that makes incremental progress in making an expression "simpler" in some sense, you can keep applying it (or them) repeatedly until you have finished the calculation.

If a function directly returns the result of calling another function, we say that the call is in "tail position". We can view this as a simple rewrite rule. Consider the fact-mul function which computes the factorial of a number n and multiplies it by another number m:

fact-mul (n, m) = n! * m

We can define two rewrite rules for this function. If n is 0, then the factorial is 1, so we just rewrite this as m. If n is not 0, we can rewrite this as fact-mul (n-1, n*m). This translates directly into Lisp:

(defun fact-mul (n m)
  (if (zerop n)
      m
     (fact-mul (- n 1) (* n m))))

The rewrite rule doesn't have to rewrite into the same function. For instance, we can rewrite odd? (x) to even? (x-1) and even? (x) to odd? (x-1):

(defun odd? (x)
  (if (zerop x)
      nil
      (even? (1- x))))

(defun even? (x)
  (if (zerop x)
      t  
      (odd? (1- x))))

Yes, this is bog-simple tail recursion, but somehow it seems even simpler to me if I think of it as a rewrite rule.

Even if you don't have tail recursion, or if you have disabled it, it still works, but the number of rewrites is limited by the stack depth.

Tuesday, May 21, 2024

If only ...

I have a problem with Lisp. It doesn’t do what I want. Whenever I write a Lisp program, I always find myself wanting new special forms that aren’t in the language. The object system almost never does exactly what I want. The control structures never cover all my use cases.

If only I could add new syntactic forms to the language. If only I could tweak the internals of the object system. If only I could add new kinds of control structure.

Oh, wait. I can.

Monday, May 20, 2024

Maybe machine (part 2)

At the end of 6.004, we had a contest for improving the performance of the Maybe machine. There was an 8-bit ALU included in the chip set we were given, so the logical thing to do was to incorporate it into the machine in place of the shift register. This would naturally improve the performance of arithmetic operations, but there were other reasons that the Maybe machine was slow.

One of the rules of the contest was that we were not allowed to change the clock speed. But the Maybe machine had a four phase clock. The first phase would address the device driving the bus, the second phase would connect it to the bus, the third phase would clock the receiving device, and the fourth phase would keep the data asserted to obey the hold time on the logic.

The four phase clock was technically unnecessary. We had built it in this manner to illustrate set-up and hold times for the bus, but the TTL chips we were using were specifically designed to be used on an open collector bus with no hold time, so you could actually clock the bus on every other phase rather than divide by four and still be within valid hardware specs. The fundamental clock speed was not changed, but the machine would run twice as fast.

The microcode word on the Maybe machine had four parts. The first part loaded the shift register from memory. The second part operated on the shift register. The third part wrote the shift register back to memory, and the fourth part was the next microcode address. But much of a typical microinstruction was wasted time. More often than not, the shift register was loaded from the last location it was written to, so it already contained the data it needed. The load cycle was often unnecessary. If you just left the data in the shift register, it would already be there for the next instruction. A memory to memory move involved a no-op on the shift register, which took clock cycles. Advancing to the next instruction took more clock cycles.

Instead of a four part microinstruction, I made a one-part microinstruction. True, this meant that I would occasionally need more microinstructions, but each one was smaller, and I could omit the no-ops, so I could accomplish the same amount in fewer clock cycles. I eliminated the next-instruction field by replacing the microcode address register with a counter.

The original Maybe machine had a two-address microcode — each instruction would specify a separate load and store address. Most people retained the wide microcode and could specify each input to the ALU independently. But I wanted to squeeze each microinstruction into 16 bits, so I turned the machine into a one-address machine with an accumulator. This made my microcode one quarter of the size, but it made it difficult to find enough bits in the microcode to accomplish all the tasks it needed to do. I didn’t have enough bits to directly address the accumulator, so I attached the accumulator to the output of the ALU. One ALU input came from the bus, the other was fed back from the accumulator. To load the accumulator, you had to pass the data through the ALU.

There weren’t enough bits for jump instructions, but I did a hack: I incremented the instruction address half way through the instruction cycle. This way, you could read the bits of the next instruction while you were still executing the current instruction. So you could do an unconditional jump by placing the target in the next instruction. There still weren’t enough bits for a conditional jump, so the microcode address space ended up being partitioned into sixteen segments and you could only conditionally jump within the same segment. To jump between segments you had to use an unconditional jump.

I obviously had to do a complete rewrite of the microcode. My friend Bob Baldwin helped me out by writing an assembler for the new microinstruction format.

The new microcode ran substantially faster than the original. When the contest came, my machine was clearly faster than the competition. Unfortunately, it crashed on the third benchmark and I ended up disqualified. I think it was a bug involving switching between the segments of microcode, but I never had the opportunity to debug it fully. I learned a hard lesson about keeping things simple.