Friday, January 3, 2025

REBOL 1.0 Was Slow

Rebol 1.0 was slow. I paid little attention to speed in the implementation — I was concerned with correctness. The intepreter was intended to be a reference implementation, with well-defined behavior on every edge case. My intent was to add a compiler at a later date.

Once source of slowness was the liberal use of first-class continuations in the interpreter. Rebol 1.0 used a “Cheney on the MTA” interpretation strategy, where no function ever returned a value and the stack simply got deeper and deeper. When the stack overflowed, a stack garbage collection was triggered. Since most of the stack was garbage, this was a fast operation (I used a garbage collector that used time proportional to live storage). With such an implementation, first-class continuations were trivial to implement — all continuations were first-class, it was just a question of whether you surfaced them to the user. I didn’t have an ideological belief either way, but there they were, so why not? Many control flow constructs that would otherwise require an ad hoc implementation can be easily implemented with first-class continuations.

Rebol had return statements that would return control to the caller from within the function. 99% of the time, the caller is sitting on the stack just above the current frame. But 1% of the time, the user would do something weird like create a lexical closure over the return statement and pass it downward. Like as not he didn’t deliberately do this, but rather used some library that was implemented in continuation-passing style. If this happened, the return statement might have to unwind an arbitrary amount of stack. To implement this, I captured the current continuation at the entry of each function and bound it to the implicit “return” variable. Invoking return invoked the continuation and returned control to the caller. The advantage of doing it this way was that return statements had the correct semantics under all circumstances. There were no special rules governing use of return and no code had to have special cases for unexpected returns.

A similar thing happened in the implementation of break and continue in loops. These were implemented by capturing the continuation at the entry of the loop and binding it to the implicit break variable, and capturing the continuation on each iteration and binding it to the implicit continue variable. Because these were first-class continuations, they could be used to restart the loop after it exited. That wasn’t a requirement. I was perfectly happy to stipulate that break and continue only work while a loop is in progress, but in Rebol 1.0, they’d continue to work after the loop finished.

Worrying about continuations captured in lexical closures may seem weird, but it’s a real issue. It is common to introduce implicit lexical contours in a program: even a let expression does it. You would like to be able to use break and continue in the body of a let expression in a loop. Some Rebol constructs were implemented by implicitly macroexpanding the code into a call to a helper function. break and continue would work across function call boundaries, so there were no limitations on introducing helper functions within a loop.

A more traditional language has a handful of ad hoc iteration constructs that are implemented with special purpose code. The special purpose code knows it is a loop and can be optimized for this. break and continue statements have a special dependency on the enclosing loop.

Rebol 1.0 was properly tail recursive, so there was no special implementation of loops. They were ordinary functions that happened to call themselves. Non-standard iteration constructs could be created by the user by simply writing code that called itself. break and continue just surfaced the interpreter’s continuation to the user. As a consequence, loops in Rebol 1.0 were implemented completely in Rebol code but had signifcant interpreter overhead.

Rebol 2.0 and later are not properly tail recusive. As a consequence, special looping constructs are required to be written in C to support iteration. Common iteration constucts such as for and while are provided and do not have interpreter overhead, but if you want a non-standard iteration construct, there is no way to achieve it. You have to re-write your code to use one of the built-in iteration constructs or go without and risk blowing the stack.

My intent was to eventually write a compiler for Rebol. I wrote a prototype called Sherman that compiled to MIT-Scheme and was supported by the MIT-Scheme runtime library. Loops compiled with Sherman ran quickly as expected.

No comments: