Abstract Heresies: January 2025

Friday, January 17, 2025

Valid Use Case for Copilot

Our compay proides us with GitHub copilot, which is yet another example of an “AI” engine. I’ve reviewed it before and I wasn’t particulary impressed. It writes mediocre code.

However, I found a use case where it really shines. I was setting up the blank project repositories for our upcoming hackathon. This involved creating a bunch of otherwise empty repositories with a customize README.md file. The simple solution was to automate this with an ad hoc shell script.

I don’t make a practice of writing shell scripts. I write a few every now and then, but I don’t live and breathe bash. It takes me a while because I forget the flags. I have to look up the exact syntax for things like the find command. Then I have to fight with the shell to get the quoting and escaping right. It’s a tedious chore. No doubt I would become adept at it if it were something I did daily, but it’s not.

So I decided to ask GitHub copilot to write some scripts for me. I gave it my problem statement: start with a file with a list of repository and description pairs. For each pair, look for the subdirectory with that name and descend into it. Use the sed command to modify the README.md file in the subdirectory to add the repository discription. Then use git to commit and push the change.

Copilot gave this a couple of seconds thought and came up with this script:

while IFS= read -r line; do
  dir_name=$(echo "$line" | cut -d' ' -f1)
  description=$(echo "$line" | cut -d' ' -f2-)
  if [ -d "$dir_name" ]; then
    sed -i "s/\[\[github_repo_description\]\]/$description/g" "$dir_name/README.md"
  else
    echo "Directory $dir_name does not exist."
  fi
done < descriptions.txt

find . -type d -name "hack*" -exec sh -c '
  cd "$0"
  if [ -f README.md ]; then
    git add README.md
    git commit -m "Update description."
    git push
  fi
' {} \;

It isn’t a script that’d I’d write home about, but it got the job done. I ran the script and had the repositories set up in couple of minutes. I could have written the script myself, but it would have take me longer. I had forgotten the options to the find command. I had forgotten the arguments to the cut command. I would have had to look them up. Copilot saved me that time.

A co-worker of mine questioned the engineering tradeoff of using a resource hog like generative AI to write crappy, throwaway shell scripts. From the standpoint of an indiviual developer, though, this is the first use case for copilot that I’ve where it actualy saved me time and effort.

Iteration

Iteration is simply that special case of recursion that doesn’t accumulate storage in the long term. It’s a notable special case because computer storage is finite, and you want to be able to write agorithms that are bound by constant space.

There are two common strategies that computer languages use to approach iteration. Functional languages like Scheme and Haskell make sure that normal function calls do not accumulate storage per se. Function calls can be used to direct the control flow, and if they direct the control flow in a loop, you will iterate. Most other languages achieve iteration via special iteration constructs that you must use if you want to iterate. Each of these approaches has its own advantages and disadvantages.

The advantage of using special iteration constructs are these:

It is clear that you are iterating.
Special constructs are usually optimized for iteration and have particular compiler support to make them efficient.
Special constructs are constrained so that you cannot accidentally write non-iterative code.

The disadvantage of using special iteration constructs are these:

Special constructs are drawn from a fixed set of constructs that are built in to the language. If you want to iterate differently, you are out of luck.
Special constructs usually do not cross function boundaries. Iteration must reside in a single function.
You have to decide beforehand that you want to iterate and choose an iteration construct.
Special constructs are usually imperative in nature and operate via side effects.

The alternative approach used by functional languages is to make the language implementation tail recursive. This has these advantages:

Iteration is automatic. You don’t have to decide that you want to iterate, it just happens when it can.
Iteration can cross function boundaries.
You can write your own iteration constructs and build them out of ordinary functions.
Iteration can be done purely functionally, without side effects.

The disadvantages of using tail recursion for iteration are these:

It is not obvious that you are iterating or intended to.
You have to be careful to place all the iteration in tail position or you will blow the stack. Beginner programmers often have difficulty recognizing which calls are tail calls and can find it hard to avoid blowing the stack.
Small, innocent looking changes in the code can change its behavior to be non tail recursive, again blowing the stack.
The stack no longer contains a complete call history. If you rely on the stack as a call history buffer for debugging, you may find debugging more difficult.

The code in an iteration can be classified as being part of the machinery of iteration — the part that sets up the itertion, tests the ending conditional, and advances to the next iteration — or part of the logic of the iteration — the specific part that you are repeating. The machinery of the iteration is usually the same across many iterations, while the logic of the iteration is idiomatic to the specific instance of iteration. For example, all iterations over a list will have a null test, a call to CDR to walk down the list, and a call to CAR to fetch the current element, but each specific iteration over a list will do something different to the current element.

There are several goals in writing iterative code. One is to have efficient code that performs well. Another is to have clear code that is easy to understand, debug, and maintain. You choose how to iterate based on these considerations. For the highest performing code, you will want detailed control over what the code is doing. You may wish to resort to using individual assignments and GOTO statements to squeeze the last clock cycles out of an inner loop. For the clearest code, you will want to use a high degree of abstraction. A clever compiler can generate efficient code from highly abstracted code, and experienced programmers know how to write abstract code that can be compiled to efficient code.

Here are some examples of iteration strategies Lisp. To make these examples easy to compare I chose a simple problem to solve: given a list of numbers, return both a list of the squares of the numbers and the sum of the squares. This is a simple problem that can be solved in many ways.

Tagbody and Go

A tagbody is a block of code that is labeled with tags. You can jump to a tag with a go statement. This is a very low level form of iteration that is not used much in modern Lisp programming. Here is an example of a tagbody:

(defun iteration-example-with-tagbody (numbers)
  (let ((squares ’())
        (total 0)
        (nums numbers))
    (tagbody
     start
       (if (null nums)
           (go end))
       (let ((square (* (car nums) (car nums))))
         (setq squares (cons square squares))
         (incf total square))
       (setq nums (cdr nums))
       (go start)
     end
       (values (nreverse squares) total))))

This is like programming in assembly code. The go instructions turn into jumps. This code is very efficient, but it is not particularly clear. The machinery of the iteration is mixed in with the logic of the iteration, making it hard to see what is going on. The code is not very abstract.

State Machine via Mutual Tail Recursion

Here we use tail recursion to iterate. The compiler will turn the tail recursive call into a jump and the variable rebinding into assignments, so this code will be about as efficient as the tagbody code above.

(defun iteration-example-tail-recursive (numbers &optional (squares ’()) (total 0))
  (if (null numbers)
      (values (nreverse squares) total)
      (let ((square (* (car numbers) (car numbers))))
        (iteration-example-tail-recursive
         (cdr numbers)
         (cons square squares)
         (+ total square)))))

This state machine only has one state, so it is not a very interesting state machine. The ultimate in iteration control is to write an iterative state machine using mutually tail recursive functions. The compiler will generate very efficient code for this, and you can write the code in a very abstract way. Here is an example of a state machine that simulates the action of a turnstile:

(defun turnstile (actions)
  "State machine to simulate a turnstile with actions ‘push’, ‘coin’, and ‘slug’."
  (locked-state actions ’() ’()))

(defun locked-state (actions collected return-bucket)
  (cond ((null actions) (list collected return-bucket))
        ((eql (car actions) ’coin)
         (unlocked-state (cdr actions) collected return-bucket))
        ((eql (car actions) ’push)
         (locked-state (cdr actions) collected return-bucket))  ;; Ignore push in locked state
        ((eql (car actions) ’slug)
         (locked-state (cdr actions) collected (append return-bucket ’(slug)))) ;; Return slug
        (t (locked-state (cdr actions) collected return-bucket))))

(defun unlocked-state (actions collected return-bucket)
  (cond ((null actions) (list collected return-bucket))
        ((eql (car actions) ’push)
         (locked-state (cdr actions) (append collected ’(coin)) return-bucket))
        ((eql (car actions) ’coin)
         (unlocked-state (cdr actions) collected (append return-bucket ’(coin)))) ;; Return coin
        ((eql (car actions) ’slug)
         (unlocked-state (cdr actions) collected (append return-bucket ’(slug)))) ;; Return slug
        (t (unlocked-state (cdr actions) collected return-bucket))))

;; Example usage:
(turnstile ’(coin push coin push))  ;; => ((coin coin) ())
(turnstile ’(push coin push))       ;; => ((coin) ())
(turnstile ’(coin coin push push))  ;; => ((coin) (coin))
(turnstile ’(push))                 ;; => (NIL NIL)
(turnstile ’(coin push push))       ;; => ((coin) ())
(turnstile ’(coin coin coin push))  ;; => ((coin) (coin coin))
(turnstile ’(slug coin push))       ;; => ((coin) (slug))
(turnstile ’(coin slug push))       ;; => ((coin) (slug))
(turnstile ’(slug slug push coin push)) ;; => ((coin) (slug slug))

The iteration machinery is still interwoven with the logic of the code. We’re still finding calls to null and cdr sprinkled around the code. Nonetheless, structuring iterative code this way is a big step up from using a tagbody and go. This is my go-to method for compex iterations that cannot easily be expressed as a map or reduce.

Loop Macro

Common Lisp’s loop macro is a very powerful iteration construct that can be used to express a wide variety of iteration patterns.

defun loop-iteration-example (numbers)
  (loop for num in numbers
        for square = (* num num)
        collect square into squares
        sum square into total
        finally (return (values squares total))))

Call me a knee-jerk anti-loopist, but this doesn’t look like Lisp to me. It has some major problems:

It is highly imperative. To understand what is going on, you have to follow the code in the order it is written. You need to have a mental model of the state of the loop at each point in the iteration. Running into a loop when reading functional code takes you out of the zen of functional programming.
The bound variables are not lexical, they are scattered around the code. You have to carefully examine each clause to figure out what variables are being bound.
You need a parser to walk the code. There is nothing that delimits the clauses of the loop; it is a flat list of random symbols and forms. You couldn’t easily write a program that takes a loop form and transforms it in some way.

Do and Friends

The do macro, and its friends dolist, dotimes, and do*, etc., are the most common iteration constructs in Common Lisp.

(defun iteration-example-with-do (numbers)
  (let ((squares ’())
        (total 0))
    (do ((nums numbers (cdr nums)))
        ((null nums) (values (nreverse squares) total))
      (let ((square (* (car nums) (car nums))))
        (setq squares (cons square squares))
        (incf total square)))))

The do macros have some drawbacks:

They are imperative. The body of a do loop ultimately must have some sort of side effect or non-local exit to “get a value out”. Notice how we bind accumulator variables in an outer scope and assign them in the inner one. This is a common pattern in a do loop.
They do not compose. You can nest a dotimes inside a dolist, e.g., but you cannot run a dotimes in parallel with a dolist.
They are incomplete. There is no do-array or do-string, for example.

But at least you can parse them and transform them. They are structured, and you can write a program that walks the clauses of a do loop and does something with them.

Map and Reduce

Map and reduce abstract the machinery of iteration away from the logic of the iteration through use of a monoid (a higher order function). The resulting code is clear and concise:

(defun iteration-example-with-map-reduce (numbers)
  (let* ((squares (map ’list (lambda (num) (* num num)) numbers))
         (total (reduce #’+ squares)))
    (values squares total)))

The looping is implicit in the mapcar and reduce functions. You can usually make the assumption that the language implemetors have optimized these functions to be reasonably efficient.

I often see programmers writing looping code when a perfectly good library function exists that does the same thing. For example, it is common to want to count the number of items in a sequence, and Commmon Lisp supplies the count function just for this purpose. There is no need to write a loop.

Common Lisp provides a filter function, but it is called remove-if-not.

The drawback of using these functions is that large intermediate sequences can be created. In our example code, the entire list of squares is constructed prior to reducing it with #’+. Of course the entire list is one of the return values, so you need it anyway, but if you only needed the sum of the squares, you would prefer to sum it incrementally as you go along rather than constructing a list of squares and then summing it. For small sequences, it doesn’t make a difference.

Series

The series macro suite attempt to bring you best of both worlds. You write series expressions that look like sequence functions, but the macro recognizes that you are iterating and generates efficient incremental code.

(defun iteration-example-with-series (numbers)
  (let ((squares (map-fn ’integer (lambda (n) (* n n)) (scan ’list numbers)))
    (values (collect ’list squares)
            (collect-sum squares))))

This code is very similar to the sequence case, but the series macro will generate code that does not construct the entire list of squares before summing them. It will sum them incrementally as it goes along.

Series will expand into a tagboy. For example, the above code will expand into something like this:

(COMMON-LISP:LET* ((#:OUT-1015 NUMBERS))
  (COMMON-LISP:LET (#:ELEMENTS-1012
                    (#:LISTPTR-1013 #:OUT-1015)
                    (SQUARES 0)
                    #:SEQ-1018
                    (#:LIMIT-1019
                     (COMMON-LISP:MULTIPLE-VALUE-BIND (SERIES::X SERIES::Y)
                         (SERIES::DECODE-SEQ-TYPE (LIST ’QUOTE ’LISTS))
                       (DECLARE (IGNORE SERIES::X))
                       SERIES::Y))
                    (#:LST-1020 NIL)
                    (#:SUM-1023 0))
    (DECLARE (TYPE LIST #:LISTPTR-1013)
             (TYPE INTEGER SQUARES)
             (TYPE (SERIES::NULL-OR SERIES::NONNEGATIVE-INTEGER) #:LIMIT-1019)
             (TYPE LIST #:LST-1020)
             (TYPE NUMBER #:SUM-1023))
    (TAGBODY
     #:LL-1026
      (IF (ENDP #:LISTPTR-1013)
          (GO SERIES::END))
      (SETQ #:ELEMENTS-1012 (CAR #:LISTPTR-1013))
      (SETQ #:LISTPTR-1013 (CDR #:LISTPTR-1013))
      (SETQ SQUARES ((LAMBDA (N) (* N N)) #:ELEMENTS-1012))
      (SETQ #:LST-1020 (CONS SQUARES #:LST-1020))
      (SETQ #:SUM-1023 (+ #:SUM-1023 SQUARES))
      (GO #:LL-1026)
     SERIES::END)
    (COMMON-LISP:LET ((SERIES::NUM (LENGTH #:LST-1020)))
      (DECLARE (TYPE SERIES::NONNEGATIVE-INTEGER SERIES::NUM))
      (SETQ #:SEQ-1018 (MAKE-SEQUENCE ’LISTS (OR #:LIMIT-1019 SERIES::NUM)))
      (DO ((SERIES::I (1- SERIES::NUM) (1- SERIES::I)))
          ((MINUSP SERIES::I))
        (SETF (ELT #:SEQ-1018 SERIES::I) (POP #:LST-1020))))
    (VALUES #:SEQ-1018 #:SUM-1023)))

90% of the time, the series macro will produce very efficient code, but 10% of the time the macro loses its lunch. It takes a little practice to get use to when the series macro will work and to write code that the series macro can handle.

Conclusion

There are many ways to iterate in Lisp, some are more efficient than others, some are more abstrac than others. You choose the way that suits your needs. I like the abstraction of the series macro, but I will also use a library function like count when it is appropriate. When I need tight control, I’ll write a state machine.

Tuesday, January 14, 2025

λ Calculus

A lambda calculus is a set of rules and strategies for manipulating logical expressions. As Church defined them, these logical expressions are linear lists of symbols. A symbol is effectively a variable. Two expressions in sequence indicate a function application. The special symbol λ is just a marker to indicate a function. Parenthesis can be used to group expressions.

McCarthy’s S-expressions are an alternative representation of a logical expression that is more suitable for a computer. Rather than a linear list of symbols, S-expressions use a tree structure of linked lists in memory. Symbols are still variables, lists represent function application, the special symbol lambda at the beginning of a list indicates a function, and grouping is achieved by nesting a list within another.

When McCarthy invented S-expressions, he wanted to show that the nested list structure of S-expressions could faithfully represent the logical expressions from lambda calculus. (It can.) A lambda calculus can be restated as a set of rules and strategies for manipulating S-expressions. This makes it easier for a computer to do lambda calculus. As a Lisp hacker, I find it also makes it easier for me to think about lambda calculus.

Your basic lambda calculus just has symbols, lists, and λ expressions. That’s it. But let us introduce one more element. Recall that we can think of a LET expression as syntactic sugar for a list (function call) where the first element (the operator) is a lambda expression. We’ll keep our S-expressions fully sugared and write all such lists as LET expressions. So now our S-expressions have symbols, lists, λ expressions, and LET expressions.

The two basic rules for manipulating S-expressions are α, which is a recursive rule for renaming a symbol in an S-expression, and β, which gets rid of a selected LET expression. A basic lambda calculus consists of these two rules and a strategy for selecting which LET expressions to get rid of. β reduces a LET expession by substituting the variables for their bindings in the body of the LET. α is used as needed to avoid unwanted variable capture during β-reduction. β eliminates one LET expression, but it can introduce more if you end up substituting a λ expression into operator position.

If an expression contains no LET expressions, we say it is in “normal form”. A common task in lambda calculus is to try to reduce an expression to normal form by attempting to eliminate all the LET expressions. Sometimes you cannot achieve this goal because every time you apply the β rule to eliminate a LET expression, it ends up introducing further LET expressions.

There are many strategies for selecting LET expressions to eliminate. Not all strategies will necessarily end up getting you to a normal form, but all strategies that do end up at a normal form end up at the same normal form (modulo the variable names). One strategy is of note: selecting the leftmost, outermost LET expression and reducing it first is called “normal order”. It is notable because if any strategy converges to normal form, then the normal order strategy will, too. However, the normal order strategy can lead to an exponential explosion of intermediate expressions. There are other strategies that avoid the exponential explosion, but they don’t always converge to normal form. Pick your poison.

α and β are the only rules we need to compute with S-expressions. The simple lambda calculus with α and β is universal — it can compute anything that can be computed. It is Turing complete.

I don’t know about you, but I find it quite remarkable that this can compute anything, let alone everything. Nothing is going on here. α just renames symbols. Using α-conversion to rename all the foos to bars doesn’t change anything but the symbol names. We define expression equivalence modulo α, so the actual names of the symbols isn’t important. Apparently β-reduction does computation, but it is hard to say how, exactly. It is just simplifying LET expressions by replacing variables with what they are bound to. But a variable is simply a name for a binding. When you replace a variable with what it is bound to, you don’t change any values. The resulting expression may be simpler, but it means the same thing as the original.

We use β reduction as a model of subroutine (function) calls. In a subroutine call, the values of the arguments are bound to the names of the arguments before evaluating the body of the subroutine. In β reduction, the body of the expression is substituted with the names bound to the value expressions. The lambda calculus model of a computer program will have a β reduction wherever the program has a subroutine call. A lambda calculus expression with opportunities for β reduction can be translated into a computer program with subroutine calls at those locations. It’s a one-to-one mapping. Since we can compute anything using just the α and β rules, we can likewise compute anything with just function calls. I think that’s pretty remarkable, too.

Turing’s machine formalism was designed to be understandable as a physical machine. Turing was particular that his machine could be realized as a mechanical object or electronically. It is far less clear how to make a lambda calculus into a physical machine. Once we recognize that β can be realized as a subroutine in software, we can see that Church’s lambda calculus formalism can be understable as a virtual machine.

Church’s Calculi of Lambda Conversion is a cool book where he lays out the principals of lambda calculus. It is pretty accessible if you have experience in Lisp, and the examples in the book will run in a Scheme interpreter if you translate them.

Monday, January 6, 2025

Substitution vs. State Transition

With a traditional curly-brace language, you have a model of a machine. A program specifies a sequence of state transitions that the machine should go through. When all the state transitions have taken place, the machine is in a final state, and the program is done.

As a programmer, you have to keep a mental model of the state of the machine at various points in the program. Your mental model has to include a temporal element — you have to remember what instruction you are working on, and what comes next. For each instruction, you have to consider the state before and after executing the instruction.

Lisp is very different from this. A Lisp program isn't a series of instructions, it is a tree of symbols. If you don’t use side effects, you can think of the Lisp interpreter as a simple substitution engine that operates on this tree. The interpreter just substitutes symbols with their values. You don’t have to consider any state before and after substitution because substitution doesn’t change anything.

Even if you do use side effects, you can still think of the interpreter as a substitution engine, but the substitution algorithm is more complicated. You will need a mental model that includes state and a temporal component, but it is still basically a substitution model.

Substitution models are easier to reason about than state transition models. This is why Lisp is so powerful. It takes a little practice to get used to programming with a simple substitution model. That’s why Lisp has a learning curve, especially for people who expect, and are used to, a state transition model.

You can also reason about a Lisp program using a state transition model. You can switch between the two models and use whatever mental model is most appropriate for the problem at hand.

You can impose a substitution model on curly-brace language, but it is more difficult. Curly-brace languages are designed to make you think about state transitions — indeed, many such languages force you to use a state transition to accomplish even the most simple conditionals and iterations — and the language doesn’t make it easy to ignore them and focus on the final value.

If Lisp is your first computer language, you learn the simple substitution model first. You’ll eventually have to learn about state transitions because they are needed to explain side effects. But you’ll mostly want to write programs that you can reason about using a substitution model. If you learn a curly-brace language first, you’ll have to think beyond the state transition model you have been taught and are using to reason about programs.

Many people find it difficult to learn how to reason with a new model. After all, the old model should work — it is universal. “Just assign the variable, don’t make me jump through hoops.” People who have a hard time wrapping their heads around substitution will probably find Lisp confusing and frustrating. But some people are able to embrace the substitution model and learn how it relates to the state transition model. These people will find Lisp to be a mind-expanding, powerful, expressive language.

Sunday, January 5, 2025

GitHub glitch bites hard (and update)

Update: Possible rogue process

GitHub reports that the call that removed the users was not the Copilot API but rather a call to the org membership API made by one of our bots.

We have a cron job that runs daily and keeps GitHub in sync with our internal databases. When GitHub and our internal databases disagree, the cron job makes API calls to reconcile the difference. It has the ability to remove users if it think they are no longer supposed to be members of the org.

It seems to have erroneously removed a large number of members. It was purely coincidence that I was editing copilot licenses at or around the time.

The question now is why? My hypothesis is that a query to our internal database only produced a partial result. The number of people determined to be valid users was far fewer than it should have been, and the cron job acted (correctly) and removed the users that were not verified by the database query. But it is hard to say for sure. I’ll need to check the cron job logs to see if I can determine what went wrong. It is very unusual, though. I’ve been here for years and I’ve never seen the cron job glitch out before. This is my working hypothesis for the moment. Perhaps it was some other error that made it think that the membership was greatly reduced.

I got bit hard by a GitHub bug last week.

Now GitHub has “organizations” which are owners of groups of repositories. GitHub carefully handles organization membership. You cannot directly join an organization, you must be invited by the organization. This gives the organization control over who can join the organization. But an organization also cannot directly add you as a member. It can invite you to join, but you must choose to accept the invitation. This gives you control over which organizations you are associated with. Membership in an organization is jointly controlled by the organization and the member. There is no way to bypass this.

This is source of friction in the onboarding process in our company. We have a few repositories on GitHub that are owned by the company. When a new hire joins the company, we want to make them members of the organization. GitHub does not provide any way to automate this. Instead, we direct new hires to an internal web site that will authenticate and authorize them and then let them issue an invitation to join the organization. GitHub won’t give them access until they accept the invitation. This is a manual process that is error prone and places the burden of doing it correctly on the new hire. We often have to intervene and walk them through the process.

Keep this in mind.

Our company provides GitHub Copilot to our developers. Some developers like it, but many of our developers choose not to use it. While Copilot licenses are cheap, there is no point in paying for a license that is not used. The UI for GitHub Copilot will display the last time a person used Copilot. It is easy to see a small set of our users who have never logged on to Copilot. We decided to save a few bucks by revoking unused Copilot licenses. We reasoned that we could always turn it back on for them if they wanted to use it.

To test this out, I selected a few of the users who had never logged in to Copilot. I turned off the checkbox next to their names in the Copilot UI and clicked the save button. It appeared to work.

Within an hour I started getting complaints. People who claimed to be active Copilot users were getting messages that their Copilot access was revoked. It seems that the UI had listed several active users as “never logged in” and I had just revoked their access.

It got worse. I had only revoked a few licenses, but dozens of people had had their access revoked. It seems that GitHub had eagerly revoked the licenses of far more people than I had selected.

It got even worse. I have a list of everyone who should have access, so I know who to re-enable. But I cannot re-enable them. It seems that in addition to revoking their Copilot access, GitHub had taken the extra step of removing their membership in the organization. I cannot restore their membership because of the way GitHub handles organization membership, so until they visit our internal web site and re-issue the invitation to the organization, I cannot restore their Copilot access. This has been a monumental headache.

I’ve spent the week trying to explain to people why their Copilot access and organization membership was revoked, what steps they need to take to restore it, and why I cannot restore it for them.

It looks like I’m going to be spending a lot of time on this next week as well.

GitHub has an enterprize offering that allows you to automate account creation and organization membership. We've been considering this for a while. Unfortunately, you cannot mix legacy accounts with enterprize accounts, so we would have to atomically migrate the entire company and all the accounts to the enterprize offering. This would be a risky endeavor for only a little gain in convenience.

Saturday, January 4, 2025

fold-… and monoids

Suppose you satisfy these axioms:

you have a binary function • and a set that • is closed over (i.e. for all x, y in the set, x•y is in the set)
• is associative, ((a • b) • c) = (a • (b • c))
There is an an identity element I: a • I = I • a = a

Then • is called a semigroup or “monoid”.

Monoids come from abstract algebra, but they are ubiquitous in computer science. Here are some monoids: string-append over strings, addition over integers, state transition over machine states, compose over unary functions.

Alternatively, we can define a monoid as a binary function • that is closed under folds fold-left or fold-right. That is, (fold-left #’• I list-of-set-elements) is an element of the set. Folds abstract the processing lists of set elements. The walk through the list, the end test, and the accumulation of the result are all taken care of by the implementation of fold. You get to focus on the monoid that acts on each element.

Folds come in two flavors: fold-left and fold-right. fold-left has an obvious iterative implementation, but the result is accumulated left to right, which can come out backwards. fold-right has an obvious recursive implementation which accumulates right to left, The result comes out in the right order, but the recursion can cause problems if the stack space is limited.

Here are some stupid tricks you can do with folds and monoids.

Create n-ary functions

If we curry the call to fold, we extend the binary function of two arguments to an n-ary function of a list of arguments. For example, n-ary addition is just a fold over binary addition. (fold-left #’+ 0 list-of-integers). Likewise, n-ary compose is just a fold over binary compose.

`Fold-…` is self documenting

If I haven’t used fold-left or fold-right in a while, I sometimes forget which one computes what. But fold-left and fold-right can document themselves: use a combining function that returns the list (F a b) to indicate a call to F:

> (fold-left (lambda (a b) (list ’F a b)) ’|...| ’(c b a))
(F (F (F |...| C) B) A)

> (fold-right (lambda (a b) (list ’F a b)) ’(a b c) ’|...|)
(F A (F B (F C |...|)))

You can see the structure of the recursion by using list as the combining function:

> (fold-left #’list ’|...| ’(c b a))
(((|...| C) B) A)

> (fold-right #’list ’(a b c) ’|...|)
(A (B (C |...|)))

`fold-…` works on groups

A group is a special case of a monoid where the combining function is also invertible. fold-… can be used on a group as well. For example, fold-left can be used on linear fractional transformations, which are a group under function composition.

`fold-…` as an accumulator

The combining function in fold-left must be at least semi-closed: the output type is the same as the type of the left input. (In fold-right, the output type is the same as the type of the right input.) This is so we can use the output of the prior call as the input to the next call. In effect, we set up a feedback loop between the output to one of the inputs of the binary function. This feedback loop has a curious property: it behaves as if it has state. This is happens even though both fold-… and the combining functions are pure functions. The state appears to arise from the feedback loop.

We can use fold-… to accumulate a value. For fold-left, at each iteration, the accumulator is passed as the first (left) argument to the combining function while the next element of the list is the second (right) argument. The combining function returns a new value for the accumulator (it can return the old value if nothing is to be accumulated on this step). The result of the fold-left is the final value of the accumulator.

Note that because the accumulated value is passed as the first argument, you cannot use cons as the combining function to accumulate a list. This is unfortunate because it seems obvious to write (fold-left #’cons ’() ...) to accumulate a list, but that isn’t how it works. However, if you swap the arguments to cons you’ll accumulate a list:

(defun xcons (cdr car) (cons car cdr))

(defun revappend (elements base)
  (fold-left #’xcons base elements))

`fold-…` as a state machine

Although fold-left is commonly used to accumulate results, it is more general than that. We can use fold-left as a driver for a state machine. The second argument to fold-left is the initial state, and the combining function is the state transition function. The list argument provides a single input to the state machine on each state transition.

For example, suppose you have a data structure that is a made out of nested plists. You want to navigate down through the plists to reach a final leaf value. We set up a state machine where the state is the location in the nested plists and the state transition is navigation to a deeper plist.

(defun getf* (nested-plists path)
  (fold-left #’getf nested-plists path))

Alternatively, we could drive a state machine by calling fold-left with an initial state and list of state transtion functions:

(defun run-state-machine (initial-state transitions)
  (fold-left (lambda (state transition)
               (funcall transition state))
             initial-state
             transitions))

Visualizing `fold-left`

If we unroll the recursion in fold-left, and introduce a temp variable to hold the intermediate result, we see the following:

(fold-left F init ’(c b a))

temp ← init
temp ← F(temp, c)
temp ← F(temp, b)  
temp ← F(temp, a)

I often find it easier to write the combining function in a fold-… by visualizing a chain of combining functions wired together like this.

Generating pipelines

Now let’s partially apply F to its right argument. We do this by currying F and immediately supplying an argument:

(defun curry-left (f)
  (lambda (l)
    (lambda (r)
      (funcall f l r))))

(defun curry-right (f)
  (lambda (r)
    (lambda (l)
      (funcall f l r))))

(defun partially-apply-left (f l)
  (funcall (curry-left f) l))

(defun partially-apply-right (f r)
  (funcall (curry-right f) r))

We can partially apply the combining function to the elements in the list. This gives us a list of one argument functions. In fact, for each set element in the set associated with our monoid, we can associate a one-argument function. We can draw from this set of one-argument functions to create pipelines through function composition. So our visualization

temp ← init
temp ← F(temp, c)
temp ← F(temp, b)  
temp ← F(temp, a)

becomes

temp ← init
temp ← F_c(temp)
temp ← F_b(temp)  
temp ← F_a(temp)

We can write this pipeline this way:

result ← F_a ← F_b ← F_c ← init

or this way:

result ← (compose F_a F_b F_c) ← init

We can pretend that the elements of the set associated with monoid are pipeline stages. We can treat lists of set elements as though they are pipelines.

Notice how we never write a loop. We don’t have the typical list loop boilerplate

(if (null list)
         ... base case ...
  (let ((element (car list))
        (tail (cdr list)))
    ... ad hoc per element code ...
    (iter tail)))

Instead, we have a function that processes one element at a time and we “lift” that function up to process lists of elements.

Pipelines are easier to reason about than loops. fold-… converts loops into pipelines.

It takes a little practice to use fold-… in the less obvious ways. Once you get used to it, you’ll see them everywhere. You can eliminate many loops by replacing them with fold-….

Monoids vs. Monads

A monad is a monoid over a set of curried functions. You use a variant of compose to combine the curried functions. Monads force sequential processing because you set up a pipeline and the earlier stages of the pipeline naturally must run first. That is why monads are used in lazy languages to embed imperative subroutines.

A Newbie Is Exposed to Common Lisp

At ChangeSafe our product was written in Common Lisp. But for "black box" testing, we didn't need the test code to be written in Common Lisp. In fact, we wanted the test code to be written in an unrelated language so that we could be sure that the product's API was language neutral.

Skill in Common Lisp was simply not a requirement — not even relevant — for QA jobs. We were a startup company, so the initial hires for QA would set the culture for the department. We were looking for people who had initiative, were self-motivated, could work with vague and underspecified guidance, figure out the job that needed to be done, and then do it. We found a young guy named Eric who fit the bill.

Eric set up our QA efforts and wrote the initial black box test code for the product. In his spare time, off the clock, he got curious about Common Lisp and why we chose to develop the product using it. He had never heard of the language. He decided to pick up the Common Lisp specification and teach himself the language. I told him that it was unnecessary for the job, but I didn't discourage him from broadening his horizons.

Eric came to me early on and asked me to explain some details about Common Lisp. He had just learned about lambda expressions and was trying them out with mapcar and other higher-order functions. It was obvious to him that a lambda expression was capturing the local stack variables. When the lambda was passed downwards to mapcar, it was able to access the variables from its point of origin further up the stack. He could see the potential, and thought it was an interesting feature.

Then, just to see what would happen, he returned a lambda expression up the stack. To his suprise, it still worked, even though the stack frame was no longer there. He had three questions for me the next day: Was this intentional? How did it work? And why would anyone do this?

I assured him that it was intentional. The feature worked by the compiler generating code to move the captured variables off of the stack and into the heap. The designers of Lisp wanted lambda expressions to “just work”, regardless of how you passed them around. The correct engineering decision — the “right thing” — was to place the burden of making this happen on the language implementor, not on the user of lambda expressions.

I told him that the reason we used Common Lisp was because the designers of the language placed a high value on doing thing “the right way”. Things were carefully designed to be easy to use and to work correctly, even in the corner cases. It was recognized that this would make it more difficult to implement the language, but a lot easier to program in it.

Eric was, quite frankly, impressed by this. It was clear to him that the designers of Lisp were in a league of their own. He couldn't wait to learn more about the language.

Incidentaly, Eric wasn't the least bit put off by the paretheses. He considered them to be a quirk that was ideomatic of the language. Some languages use infix notation, some calculators use postfix. Lisp happened to use prefix notation. It was no big deal.

Eric was a great hire and had the potential to go far had the company not gone under when the internet bubble burst.

Friday, January 3, 2025

Dvorak and Lisp

I use a slightly modified Dvorak keyboard. It is like a standard Dvorak keyboard, but the parentheses have been relocated to be unshifted keys.

I don't believe Dvorak is any faster than QWERTY, but it feels more comfortable to me, and the unshifted parens make Lisp a lot easier to type.

Except for the word lambda. The M, B, and D, are all right index finger.

Alas.

REBOL 1.0 Was Slow

Rebol 1.0 was slow. I paid little attention to speed in the implementation — I was concerned with correctness. The intepreter was intended to be a reference implementation, with well-defined behavior on every edge case. My intent was to add a compiler at a later date.

Once source of slowness was the liberal use of first-class continuations in the interpreter. Rebol 1.0 used a “Cheney on the MTA” interpretation strategy, where no function ever returned a value and the stack simply got deeper and deeper. When the stack overflowed, a stack garbage collection was triggered. Since most of the stack was garbage, this was a fast operation (I used a garbage collector that used time proportional to live storage). With such an implementation, first-class continuations were trivial to implement — all continuations were first-class, it was just a question of whether you surfaced them to the user. I didn’t have an ideological belief either way, but there they were, so why not? Many control flow constructs that would otherwise require an ad hoc implementation can be easily implemented with first-class continuations.

Rebol had return statements that would return control to the caller from within the function. 99% of the time, the caller is sitting on the stack just above the current frame. But 1% of the time, the user would do something weird like create a lexical closure over the return statement and pass it downward. Like as not he didn’t deliberately do this, but rather used some library that was implemented in continuation-passing style. If this happened, the return statement might have to unwind an arbitrary amount of stack. To implement this, I captured the current continuation at the entry of each function and bound it to the implicit “return” variable. Invoking return invoked the continuation and returned control to the caller. The advantage of doing it this way was that return statements had the correct semantics under all circumstances. There were no special rules governing use of return and no code had to have special cases for unexpected returns.

A similar thing happened in the implementation of break and continue in loops. These were implemented by capturing the continuation at the entry of the loop and binding it to the implicit break variable, and capturing the continuation on each iteration and binding it to the implicit continue variable. Because these were first-class continuations, they could be used to restart the loop after it exited. That wasn’t a requirement. I was perfectly happy to stipulate that break and continue only work while a loop is in progress, but in Rebol 1.0, they’d continue to work after the loop finished.

Worrying about continuations captured in lexical closures may seem weird, but it’s a real issue. It is common to introduce implicit lexical contours in a program: even a let expression does it. You would like to be able to use break and continue in the body of a let expression in a loop. Some Rebol constructs were implemented by implicitly macroexpanding the code into a call to a helper function. break and continue would work across function call boundaries, so there were no limitations on introducing helper functions within a loop.

A more traditional language has a handful of ad hoc iteration constructs that are implemented with special purpose code. The special purpose code knows it is a loop and can be optimized for this. break and continue statements have a special dependency on the enclosing loop.

Rebol 1.0 was properly tail recursive, so there was no special implementation of loops. They were ordinary functions that happened to call themselves. Non-standard iteration constructs could be created by the user by simply writing code that called itself. break and continue just surfaced the interpreter’s continuation to the user. As a consequence, loops in Rebol 1.0 were implemented completely in Rebol code but had signifcant interpreter overhead.

Rebol 2.0 and later are not properly tail recusive. As a consequence, special looping constructs are required to be written in C to support iteration. Common iteration constucts such as for and while are provided and do not have interpreter overhead, but if you want a non-standard iteration construct, there is no way to achieve it. You have to re-write your code to use one of the built-in iteration constructs or go without and risk blowing the stack.

My intent was to eventually write a compiler for Rebol. I wrote a prototype called Sherman that compiled to MIT-Scheme and was supported by the MIT-Scheme runtime library. Loops compiled with Sherman ran quickly as expected.

Thursday, January 2, 2025

GitHub Copilot Revisited

It’s been a year since I wrote a review of GitHub Copilot. A reader asked me to write an update. He wanted to know what I thought of the apparent negative effects of Copilot on the quality of code in several codebases.

GitHub Copilot acts as an autocomplete tool. Suggested completions appear in the editor as you enter code. You can accept the suggestion or ignore it. But your frame of mind informs how you decide whether to accept or ignore a suggestion. Here are a few of the ways you can interact with GitHub Copilot.

The StackOverflow mode. On the StackOveflow web site, you’ll find questions about coding and answers that often contain sample code. As an engineer, you craft the solution to your specific problem by adapting some of the sample code to your specific needs. The problem with StackOverflow is that the quality of the answers varies widely. Some answers come with some well written and well tested sample code. Other times you’ll find that someone posts a code snippet that they didn’t even attempt to run. Sometimes the code in the answer is just plain wrong. You have to draw on your engineering skills to carefully evaluate and adapt the code you find on StackOverflow.

In StackOverflow mode, you pretend that GitHub Copilot is a StackOverflow search engine. You prompt Copilot to generate snippets of code. You evaluate the generated code as though it were taken from a StackOverflow answer. The code may be fairly well written and work as is, it might be completely wrong, or it might be somewhere inbetween. You have to be be prepared to evaluate the code critically. You may need to tweak the code to make it work in your specific context. There may be subtle bugs you need to watch for.

The autocomplete mode. When using Copilot in this mode, you treat Copilot as an autocomplete tool. As you type your program, Copilot will attempt to complete the snippet you are typing. The best way to interact with Copilot in this mode is to ignore most of the suggested completions and only accept the ones that are obviously right. Often Copilot suggests exactly what you were going to type anyway. Accept those suggestions. You don’t want to spend the time and intellectual energy evaluating and adapting suggested code in this mode. You just to want to get your code written quickly. Accept the code that saves you typing and reject everything else.

Code generation mode. Copilot is pretty good at discovering repeated patterns in your code. In code generation mode, you craft some prompt code attempting to induce Copilot to generate templated output. Typically writing out one or two examples of a repeating pattern of code is sufficient for Copilot to get the gist of what you are doing and have it automatically generate the next few repetitions.

Each of these modes of interacting with GitHub Copilot requires different amounts of attention and focus, and applying your attention and focus to different areas. To get the most out of Copilot, you need to be able to switch your attention and focus between the interaction modes. The better you can do this, the more you will get out of Copilot. It takes practice.

Copilot produces mediocre code. It’s not imaginative, it doesn’t have the big picture. It writes the same code that J. Random Neckbeard would write. Mr. Neckbeard will hack out servicable solutions, but won’t craft elegant ones. If you let Copilot take over writing large sections of code, you’ll end up with a pile of bad code. It may run, but it will be hard to read, understand, and maintain. You have to assert yourself and not let Copilot take control.

When you use Copilot, you have to be the boss. It’s too easy to be lazy and accept suggestons that Copilot makes because although they aren’t great, and they aren’t what you would have written, they are adequate. Do this enough and the resulting code won’t be great, but instead barely adequate. Resist the temptation to be lazy and reject suggestions that aren’t what you want.

I’ve been using Copilot for over a year now. I’ve used it in anger on a medium sized go project. It turns out that if you point Copilot at a text file or html file, it will generate prose as well as source code. As you write, Copilot will try to finish your sentences. If you let it do this too much, you’ll end up sounding like a section of a Wikipedia article. It is best to already have some text in mind and let Copilot try to guess what it is. Reject the suggestion when it guesses wrong. This way you can use Copilot to save you typing, but you sound like yourself. Copilot does however, occasionally suggest continuations that raise points you hadn’t addressed. The suggestion may be a bit of a non-sequitur at the point where it is made, but I’ve found that Copilot can remind me of things I’ve forgotten to mention.

Copilot is not a pair programmer. It is a complex program generation model with a front-end that appears to have a deceptively shallow learning curve. There are several different ways to effectively use Copilot, but they all present themselves as autocomplete. It takes time and practive to learn the different effective ways to use Copilot and to switch between them as you program.

If you are J. Random Neckbeard, Copilot will help you become much more prolific without a lot of effort. But if your standards are higher, you’ll have to work harder to get the most out of Copilot, and you’ll find yourself rejecting it more. Be prepared to put a few months of effort into practicing the different ways to use Copilot. Like any complex tool, it takes time to get good at using it.

Can you trust Copilot? Can you trust an engineer who uses Copilot? Ask yourself, do you trust StackOverflow? Do you trust an engineer who uses StackOverflow? Do you trust your engineers? Copilot may be the ultimate source of buggy code, but the engineer is responsible.

Many codebases have reported a decrease in quality since Copilot has come on the scene. I think it is reasonable to discourage its use in these codebases. But I don’t think Copilot makes programmers worse. It makes lazy programmers more prolific, which is probably not what you want. If you are a good programmer, Copilot can be a useful tool in your toolbox. If you are careful to not let Copilot write too much of your code, you can save time without your code suffering.

Scheme Interpreter: Conclusions

This experiment with writing an MIT-Scheme S-code interpreter in C# was successful in these ways:

It showed that the S-code interpreter is an independent component of the Scheme system. The interpreter substrate can be replaced with a new implementation, written in a different language, using a different evaluation strategy, without replacing the Scheme runtime system written in Scheme.
It showed that the S-code interpreter can, on small segments of code, perform as fast as compiled code. However, growing the size of these small segment causes an exponential increase in the number of interpreter specializations. The obvious solution of automatically generating interpreter specializations on demand is the equivalent of JIT compilation.
It validated the idea that the lexical environment can be represented as a flattened vector of values. Mutable variables can be implemented by cell conversion. Variable values are copied from outer scopes to inner scopes when closures are created. The semantics of such an implementation is equivalent to the semantics of a nested series of frames as used in MIT-CScheme.
It showed that you can implement tail recursion via trampolines at each call site, and that you can implement first-class continuations by testing for a magic return value after the return of each trampoline. We don’t use the C# exception handling mechanism to unwind the stack when implementing first-class continuations, just a conditional branch and a normal return. This is far less complicated and expensive.

It was a failure in these ways:

Although it showed one way in which we could take incremental steps to increase the speed of the interpreter until it approached the speed of compiled code, each step resulted in an exponential increase in the number of specializations in the interpreter and had diminishing returns.
The ultimate outcome of this process would be an interpreter with thousands of specializations. Small Scheme programs could be completely represented by a single specialization, and they would be interpreted as fast as compiled code. But this is because the specialization is eessentially a compiled version of the Scheme program. In other words, we ultimately will have an interpreter that “optimizes” by looking up a program in a huge table that maps small programs to their precomputed compiled bodies. This is just an unusual and inefficient way to implement a compiler.
Because C# offers no way to dump a the heap in binary format, we must cold load the system each time we start it.
One of the tasks in the cold load is to initialize the unicode tables. These are big tables that take a long time to initialize.
It took an annoyingly long time to get to Scheme’s initial top-level prompt.
Debugging crashes in the Scheme system was annoying and slow because we have to cold load the Scheme system to reproduce bugs.
I have understated a large component of the work: providing a new C# implementation for each of the hundreds of primitives in the Scheme runtime. I only bothered to implement those primitives called as part of the cold lood boot sequence, but still there were a lot of them. For many of these primitives, the C# implementation just achieved the same effect “in spirit” as the MIT-CScheme implementation. These were easy to implement. But some were more persnickety where it was vital that the C# implementation produced exactly the same bits as the MIT-CScheme implementation. For instance, the code used to hash the types for generic method dispatch had to produce the exact same hash values in both implementations. This is because there is code that depends on the hashed multimethod ending up at a precomputed location in a method cache.
The C# interpreter was complete enough to boot a Scheme cold load and run it to the top-level prompt. It could run the top-level REPL. But much was missing. It could not host the SF program, which generates the S-code for the Scheme runtime. You’d have to run an original instance of MIT-CScheme to generate the S-code that you would then run in the C# interpreter.

I think the next Lisp system I will try should be based around a simple, portable JIT compiler.

Wednesday, January 1, 2025

Calling Conventions in the Interpreter

C# is not tail recursive. It could be. The IL that it compiles to supports tail recursion, but the C# compiler doesn’t generate the tail call instruction. It would be a simple thing to add: when the compiler emits a call instruction, it could check if the next instruction is a return, and if so, emit a tail call instruction. This could be controlled by a compiler flag so only us weirdos who want this feature would enable it.

But until the C# compiler adds this feature, we have to resort to other methods. I chose to use a trampoline at each call site. This is a small segment of code that awaits the result of the function call. If the callee wants to tail call, it returns the tail call target to the caller, which performs the call on the callee’s behalf. This requires a modification to the calling conventions.

EvalStep is the virtual method that all S-code objects implement to perform an evaluation. Its signature is this:

abstract class Control : SchemeObject
{
     public abstract TailRecursionFlag EvalStep (out object answer, 
                                                 ref Control expression, 
                                                 ref Environment environment);
}

The result of the evaluation is returned in the answer parameter. This is an out parameter, so the answer is allocated in the caller and a pointer to it is passed to the callee. The callee returns the answer by modifying it in the callers stack frame.

The expression and environment parameters are the expected parameters for a call to Eval. They, too, are allocated in the caller’s frame and references to them are passed to the callee. The callee is allowed to modify the caller’s values of these variables.

The returned value is a TailRecursionFlag. This is either 1, indicating that a value has been returned in the answer, or 0, indicating that the caller should perform another EvalStep. To return a value, the callee modifies the answer. To perform a tail call, the callee modifies the expression and environment references and returns 0.

Any caller must call EvalStep as follows: The caller allocates an answer variable to receive the answer of the call. It also allocates an expression, and environment variable to pass to the callee. It then calls EvalStep in a loop until the callee returns a TailRecursionFlag of 1, indicating that the answer has been set to the return value.

In the EvalStep for an S-code Conditional we see an example of the calling convention:

  object ev;
  Control unev = predicate;
  Environment env = environment;

  while (unev.EvalStep (out ev, ref unev, ref env) == TailRecursionFlag.TailCall) { };

We are making a recursive call to evaluate the predicate. We set up ev to receive the result of the evaluation. We set up unev and env to hold the expression and environment to pass to EvalStep. unev.EvalStep does the eval dispatch via virtual function dispatch.

If the predicate returns a TailRecursionFlag of ReturnValue, the loop will exit. The predicate is assumed to have put the return value in the ev variable.

If the predicate wants to tail call, it will modify the values of unev and env to the new expression and new environment, and return a TailRecursionFlag of TailCall. The loop will iterate, using the new value of unev and env to again dispatch a call to EvalStep.

When the while loop exits, the ev variable will contain the return value of the predicate. Control may be returned to the while loop several times before the loop exits. This is the trampoline in action.

Conditional expressions don’t return a value. They either tail call the consequent or the alternative. The EvalStep for a conditional ends like this:

  answer = null;
  expression = (ev is bool evb && evb == false) ? alternative :
  return TailRecursionFlag.TailCall;
}

The answer variable in the caller is set to null. out parameters must always be assigned to before the function exits, so this just keeps the compiler happy. If the return value of calling EvalStep on the predicate is the boolean false, we set the expression in the caller to the alternative, otherwise the consequent. This is the target of our tail call to EvalStep. For the scode for a conditional, we leave the environment alone — the tail call uses the same environment unchanged. We finally return TailRecursionFlag.TailCall so that the caller’s trampoline makes another iteration around its while. It will call EvalStep on the alternative or consequent that we stuffed in the caller’s expression.

This little song and dance is performed at every recursive call to EvalStep making EvalStep behave as a tail-recursive function. This calling convention is about half the speed of a normal C# method call. It is the cost of using a trampoline for tail recursion.

First Class Continuations

There is one more obscure reason that the control might return to us when evaluating the predicate. If some function further down the call chain invokes call-with-current-continuation, we need to copy the stack. The callee indicates this by returning a magic return value of Special.UnwindStack. The callee sets the caller’s environment to an UnwinderState that will accumulate the stack frames as we unwind the stack. So our calling convention says we need to check the return value of EvalStep, and if it is Special.UnwindStack, we allocate a ConditionalFrame on the heap that will contain the state of the current stack frame. We AddFrame to the UnwinderState. We propagate this up the stack by putting it in the caller’s environment, setting the caller’s value of answer to Special.UnwindStack and returning TailRecursionFlag.ReturnValue to stop the caller’s trampoline loop.

The full code of EvalStep for an S-code if expression is this:

 public override TailRecursionFlag EvalStep (out object answer, 
                                             ref Control expression,
                                             ref Environment environment)
{
    object ev;
    Control unev = predicate;
    Environment env = environment;

    // Tail recursion trampoline.
    while (unev.EvalStep (out ev, ref unev, ref env) == TailRecursionFlag.TailCall) { };
    // Support for first class continuations.
    if (ev == Special.UnwindStack)
    {
        ((UnwinderState) env).AddFrame (new ConditionalFrame (this, environment));
        environment = env;
        answer = Special.UnwindStack;

        return TailRecursionFlag.ReturnValue;
    }

    // Tail call EvalStep on the consequent or alternative.
    answer = null;
    expression = (ev is bool evb && evb == false) ? alternative : consequent;
    return TailRecursionFlag.TailCall;
}

First class continuations allow you unload and reload the pending call chain. We see that at each call site, we must check the return value and, if it is Special.UnwindStack, we create a new Frame on the heap and add it to the unwinder state befor we propagate the Special.UnwindStack up the call chain.

At the very top of the call chain, we have the outermost call to EvalStep. If the Special.UnwindStack value is returned to this call, the stack has been unwound and the UnwinderState is sitting in the environment variable. We need to rewind the stack and put the stack frames back on the stack. We create a RewindState from the UnwinderState. Each time we PopFrame from the RewindState, we get a deeper frame. We reload the stack by getting the outermost frame from the RewindState and calling EvalStep on it. The EvalStep for a Frame sets up the trampoline loop, calls PopFrame to get the next frame, and calls EvalStep on it. When we run out of stack frames to reload, the stack is reloaded and we return control the innermost frame so it can continue where it left off. This is the rewind loop.

The EvalStep for a Frame, after making the recursive call to EvalStep on the next frame, continues with code that is a duplicate of the code in the original frame before the cotinuation was captured. A specific example will make this clear. If an if expression is on the stack when it is uwound, a ConditionalFrame is created. A ConditionalFrame is a subclass of SubproblemFrame which has this EvalStep method:

public override TailRecursionFlag EvalStep (out object answer,
                                            ref Control expression,
                                            ref Environment environment)
{
    object temp;
    Control expr = ((RewindState) environment).PopFrame ();
    Environment env = environment;
    while (expr.EvalStep (out temp, ref expr, ref env) == TailRecursionFlag.TailCall) { };
    if (temp == Special.UnwindStack)
    {
        ((UnwinderState) env).AppendContinuationFrames (continuation);
        environment = env;
        answer = Special.UnwindStack;

        return TailRecursionFlag.ReturnValue;
    }
    expression = this.expression;
    environment = this.environment;
    return Continue (out answer, ref expression, ref environment, temp);
}

public abstract TailRecursionFlag Continue (out object answer,
                                            ref Control expression,
                                            ref Environment environment,
                                            object value);

That is, the EvalStep of the SubproblemFrame establishes a trampoline, pops the next frame from the RewindState, and invokes its EvalStep method. When an answer is returned, the SubproblemFrame calls its Continue method.

The Continue method is a virtual method that is implemented by each subclass of SubproblemFrame. It finishes the work of the frame. In the case of a ConditionalFrame, the Continue method is this:

public override TailRecursionFlag Continue (out object answer,
                                            ref Control expression,
                                            ref Environment environment,
                                            object value)
{
    answer = null;
    expression = value is bool bvalue && bvalue == false
      ? SCode.EnsureSCode (this.expression.Alternative)
      : SCode.EnsureSCode (this.expression.Consequent);
    return TailRecursionFlag.TailCall;
}

compare this to the code in the original Conditional:

    // Tail call EvalStep on the consequent or alternative.
    answer = null;
    expression = (ev is bool evb && evb == false) ? alternative : consequent;
    return TailRecursionFlag.TailCall;

There are only superficial differences: the Continue method gets the value returned by the predicate in an argument rather than in a local variable. It type checks the alternative and consequent components of the if expression by calling SCode.EnsureSCode. Otherwise, the code does the same thing.

It is not possible to actually rewind the stack with the original set of pending methods. What we do instead is rewind the stack with methods that do the same thing as the original pending methods. It is close enough. The same values will be computed.

There is one place you can see the difference. If you look at the stack trace in the debugger before you capture a continuation, you will see the pending recursive calls to the S-code EvalStep methods. If you look at the stack trace in the debugger after you capture a continuation, you will instead see pending calls to the EvalStep methods of a set of frames. The pending frames are in the same order and have names similar to the original pending methods. They compute the same values, too. But the debugger can notice that these are not the originals.

More Inlining

Calls to (null? x) usually appear as the predicate to a conditional. We can specialize the conditional. Instead of

[if
  [primitive-null?-argument0]
  [quote 69]
  [quote 420]]

We create a new S-code construct, if-null?-argument0, and construct the conditional as

[if-null?-argument0 
  [quote 69]
  [quote 420]]

We avoid a recursive call and generating a ’T or ’NIL value and testing it, we just test for null and jump to the appropriate branch, just like the compiled code would have done.

Multiple arguments

We can further specialize the conditional based on the types of the consequent and alternative. In this case, they are both quoted values, so we can specialize the conditional to [if-null?-argument0-q-q 69 420]. (Where the name of the S-code type is derived from the type of the consequent and alternative.)

if-null?-argument0-q-q is an esoteric S-code type that codes a test of the first argument for null, and if it is null, returns the first quoted value, otherwise the second quoted value. This S-code type runs just as fast as compiled code. Indeed the machine instructions for evaluating this S-code are the same as what the compiler would have generated for the original Lisp form.

But there is a problem

Why not continue in this vein specializing up the S-code tree? Wouldn’t our interpreter be as fast as compiled code? Well it would, but there is a problem. Every time we add a new S-code type, we add new opportunities for specialization to the containing nodes. The number of ways to specialize a node is the product of the number of ways to specialize its children, so the number of ways to specialize the S-code tree grows exponentially with the number of S-code types. The few specializations I’ve just mentioned end up producing hundreds of specialized S-code types. Many of these specialized S-code types are quite esoteric and apply at most to only one or two nodes in the S-code tree for the entire program and runtime system. Performing another round of inlining and specialization would produce thousands of specialized S-code types — too many to author by hand, and most of which would be too esoteric to ever be actually used.

The solution, of course, is to automate the specialization process. We only generate a specialized S-code type when it is actually used by a program. The number of specialized S-code types will be limited by the number of ways programs are written, which is linear in the size of the program.

But specializing the code when we first encounter it is just JIT compiling the code. We’ve just reinvented the compiler. We might as well skip the multi-level specialization of the S-code tree and write a simple JIT compiler.

Inlinig Primitive Function Calls and Argument Evaluation

Inlining some primitives

Reconsider our model of a Lisp program as a “black box” that issues a series primitive function calls. We can eliminate some of the primitive function calls by implementing them directly within our “black box”. We inline some primitives.

Take null? as an example. Instead of constructing (null? arg) as

[primitive-funcall1
  [quote [primitive null?]]
  [argument 0]]

we add a new S-code construct, primitive-null?, and construct (null? arg) as

[primitive-null?
  [argument 0]]

We don't have to evaluate the function. We don't even have to jump to the entry point of the null? primitive. After we evaluate argument 0, we just test for null in the interpreter and return T or NIL as appropriate.

There are maybe 20 or so primitives that are used frequently enough to be worth inlining in this way. Each primitive you inline like this adds bloat to the interpreter.

Inlining simple arguments

The leaves of a tree of S-code are the atomic expressions, whereas the internal nodes of the tree are compound expressions. We can eliminate the leaves by inlining them into their parent nodes. For example if a leaf node is a lexical variable reference, we inline this into the parent node. We unroll the evaluation of the leaf node thus saving a recursive call to the interpreter and an evaluation dispatch.

Consider our previous example which we consructed as

[primitive-null?
  [argument 0]]

We further specialize primitive-null? based on its argument type into primitive-null?-argument or primitive-null?-lexical. Now our S-code becomes:

[primitive-null?-argument 0]

The leaf node [argument 0] is absorbed into the parent node [primitive-null? ...] making a new leaf node [primitive-null?-argument]. The evaluator for this S-code node simply tests if argument 0 is null and returns T or NIL as appropriate.

Compare this to the original S-code:

[funcall
  [global 'null?]
  [argument 0]]

This required two recursive calls to the interpreter, a global lookup, and a primitive function call on top of the null test. We’ve eliminated all of those. There’s not much left to do. Testing null? in the interpreter is almost as fast as testing null? in compiled code.

The number of S-code types needed to perform this inlining is the number of kinds of leaf nodes raised to the power of the number of leaves in the parent node. A call to a one-argument primitive would need specializations for the cases where the argument is a quoted value, an argument, a lexical variable or a global variable — four specializations. Calls to a two-argument primitive turn into one of sixteen specializations — the product of four for each argument. A call to a three-argument primitive would turn into one of sixty-four specializations.

We can inline all the non-compound argument evaluations, both to primitive functions and to user-defined functions. In our S-code tree, we have removed all the leaf nodes and absorbed them into their parent nodes (which have now become new leaves). The interpreter is now quite a bit faster, although still not as fast as compiled code.