Sunday, January 26, 2020

The pros and cons of Agile

Agile methodology is currently the popular way of attempting to develop commodity software in a factory-like manner.  It's a little hard to define what exactly is Agile and what is not.  I've worked in several companies and with several groups in companies that all claim to be Agile, yet they did things in very different ways.  But they all seemed to agree on a few things, and this I suppose could be the spirit of Agile, if not the letter.

The basic characteristic of Agile is that you break down the software into smaller and smaller tasks until you reach those tasks that can be comfortably handled by a small (2-8 person) team of engineers to complete in a small (1-2 week) time frame.  Then at the end of the time frame, you supposedly have some software that “works” in some sense of the word.  It exhibits basic functionality and the general gist of what the customer wanted, if not satisfying all the requirements for the entire project. Then over the next several time periods of development, the software is iteratively improved by fleshing out remaining requirements, adding needed functionality, hooking components together, etc. During this time, the customer is involved to make sure that what is being delivered is actually what is needed.

Some projects I worked on did this formally by a book and followed strict guidelines for the methodology, others just winged it, but all had the basic characteristics above.

One of the primary advantages of Agile is its use to management.  By having the large problem broken down into team-size, biweekly pieces, management can easily allocate and track resources usage and progress.  They can treat a team as a black box of software development and assign tasks as they arise.  Management can attempt to measure the performance of a team and see whether it is increasing, decreasing, or remaining steady.  Teams are what managers like to manage.

Another advantage is frequent feedback from the customer.  Since after each time period, a somewhat working version of some fragment of the product is available for demonstration, the customer can give feedback as to how and whether this seems to meet his needs.  He can offer suggestions about what might be improved, what features he needs to make the product at least minimally useful to him, and prevent development from getting off track.

But Agile is not a panacea.  There is still a significant amount of software produced with the traditional “waterfall” methodology with specification completed before coding begins and integration done as a final step in coding and only then releasing to the customer.  There is also a fair amount of software written “artistically”. I would define artistic software as that written by a single developer working alone over a period of several months. Frequently, such a project never gets beyond the hobbyist stage, and as such it is a risky approach to writing production code. But on occasion, an artistic project can turn into something novel and useful. It more often exhibits a unity of vision and coherence that is harder to find in software written by groups of people. (Which isn't to say that small groups cannot write software with unity of vision and coherence, it's just harder. Or they'll have one particular person in the group that has more insight than the others.)

Managers aren't as useful to artistic developers. Artistic developers tend to manage themselves. And you cannot swap out one developer for another without swapping out the entire project with him. A manager can work with an artistic developer as a peer, and help manage the project, but cannot manage the developer.

Frequently involving customers has its pros and cons as well. Often customers have difficulty imagining anything beyond incremental improvements to the current ways of doing things. They'll want a UI widget that will make some task slightly easier, but not think of automating the task altogether. They'll want to automate a series of inefficient tasks when a different viewpoint of the end result might make those intermediate tasks irrelevant. Customers are not impressed with changes to the code that don't produce visible effects. You may have spent a week refactoring in order to make it trivial to add new commands and new outputs, but customers don't care. Customers don't care about potential use cases, they care about their specific use case to the exclusion of everything else. This can be discouraging to developers.

Because Agile is so useful to managers, big and intermediate sized companies will continue to use it to develop commodity software in a factory-like style. It isn't going to be replaced any time soon. But there is still ample room in the market for small companies and individuals with vision to carve out niches that Agile methodologies will overlook and find tricky to fill.

(But I'm a romantic at heart, and I like the image of the lone hacker crafting software on his home computer in his room late at night. If only it were easy to make a living that way.)

Tuesday, January 21, 2020

But what if you really want to push a stack frame?

If you really don't want tail recursion, the solution is simple: don't put the call in “tail position”. We define a rather trivial function dont-tail-call and use it like this:
  (foo x))
The semantics of Scheme are that the arguments are evaluated before the call to the function, so the call to foo is required to occur before the call to dont-tail-call which necessitates allocation of a continuation.

But what about a really clever compiler that somehow “optimizes” away the call to dont-tail-call? Such an “optimization” is highly questionable because it changes the observable semantics of the program so it is no longer call-by-value. A compiler that performed that transformation wouldn't be a Scheme compiler because Scheme is a call-by-value language.

But what if we had really clever compiler and we disregard the change in semantics? We can easily thwart the compiler by deferring the definition of dont-tail-call until run time. Even the cleverest compiler cannot see into the future.

The definition of dont-tail-call is left to the reader, as is how to defer it's definition until run time.

Afraid of Tail Recursion

It's well known fact among proponents of tail recursion that some people just don't get it. They view tail recursion at best as a quirky compiler optimization that turns some recursive calls into loops. At worst, they see it as some sort of voodoo, or a debugging pessimization. They see little value in it. Some have outright disdain for it.

Tail recursion isn't just about turning recursive calls into loops. It's about changing how you look at function calling. Tail recursion just happens to fall out of this new viewpoint.

Most programmers, I think, view function calls as if they were akin to a short vacation. You pack up the arguments in your luggage, travel to the destination, unpack your luggage, do some stuff, repack your luggage with some souvenirs, return home, unpack everything and resume life where you left off. Your souvenirs are the return value.

Should you need a vacation from your vacation, you do the same thing: pack up the arguments in your luggage, travel to your new destination, unpack your luggage, do some stuff, repack your luggage with some souvenirs, return to your original vacation spot and resume your original vacation.

Tail recursion aficionados realize that the journey itself is the important part of the function call, and that a vacation includes two journeys. On the first journey you pack up the arguments, including the return ticket, in your luggage, use the outbound ticket to journey to the destination, unpack your luggage, and start doing stuff. When you run out of stuff to do, you make the second journey. You fetch the return ticket, repack your luggage, take the ticket to wherever it leads (presumably back home), unpack everything, and resume doing whatever you were doing there.

But perhaps you want to visit grandma instead of going directly home. Then we change the script slightly. When you run out of things to do on your vacation, you pack up your luggage with your souvenirs and the return ticket, then you journey to grandma's house, where you unpack and start doing stuff. Eventually you are done visiting grandma, so then you fetch the return ticket, repack your luggage, take the ticket to wherever it leads, unpack everything, and resume doing stuff there. It's a three-legged journey. You don't go from grandma's back to the vacation resort — there's nothing left for you to do there. You take the return ticket directly home.

Viewing things this way, a function call involves packaging the arguments in a suitable way, deallocating any temporary storage, and then making an unconditional transfer to the function, where we unpack the arguments and resume execution of the program. It is simply “a goto that passes arguments”.*

A function return is simply “a goto that passes a return value”. It involves packaging the return value in a suitable way, deallocating any temporary storage, and then making an unconditional transfer to the return address, where we resume execution of the program.

A tail recursive function call is simply “a goto that passes arguments”. It involves packaging the arguments in a suitable way, deallocating any temporary storage and then making an unconditional transfer to the function, where we resume execution of the program.

Do we really deallocate temporary storage before every control transfer? Certainly a return pops the topmost stack frame, and as often implemented, a tail recursive function call deallocates its stack frame or replaces it before transferring control, but a non tail recursive call? It does so as well, it's just that it also has to pack those values into a new continuation for the return trip. We use an implementation trick to avoid the absurdity of actually moving these values around: we move the base of frame pointer instead. Voila, we simultaneously deallocate the stack frame and allocate the continuation with the right values already in place.

Deallocating storage before each control transfer is an important part of the protocol. We're making a unconditional transfer to a destination with the hope, but no guarantee, that we'll come back, so we'd better tidy up before we leave. This ensures that we won't leave a trail of garbage behind us on each transfer which would accumulate and adversely affect the space complexity of our program.

Once you view a function call and return as not being a single sequence, but each one a separate, and virtually identical sequence, then tail recursion becomes a natural consequence. Tail recursion isn't a special case of function call, it is the same thing as a function call, the only difference being whether a new continuation (the "return ticket") is allocated in order to come back. Even function returns are the same thing, the only difference being that destination is (usually) determined dynamically rather than statically. Tail recursion isn't just another optimization, it's the result of treating inter-procedural control transfer symmetrically.

Another natural consequence is greatly increased options for control flow. Instead of a strict call/return pattern, you can make "three-legged" trips, or however many legs you need. You can make loops that incorporate one, two, or even a dynamically changing number of functions. You can replace compiler-generated returns with user-provided function calls (continuation-passing style) and implement arbitrarily complex control and data flow like multiple return values, exception handling, backtracking, coroutines, and patterns that don't even have names yet. And of course you can mix and match these patterns with the standard call and return pattern as well.

The phrase "tail recursion" is shorthand for this symmetric view of interprocedural control flow and is meant to encompass all these consequences and control flow options that spring from it. It's not about simply turning recursive functions into loops.

People who are afraid of tail recursion seem unable to see any value in taking up the symmetric viewpoint despite the fact that it opens up a whole new set of control flow techniques (in particular continuation-passing style). They find the notion that a procedure call is “a goto that passes arguments” “nonsensical”. A lot of good research has come from taking this nonsense seriously.

*The view that a function call is simply a “a goto that passes arguments” was developed by Steele in his “Lambda papers”.

The important point of cleaning up before the control transfer was formalized by Clinger in “Proper Tail Recursion and Space Efficiency”.

Someone — it might have been Clinger, but I cannot find a reference — called tail recursion “garbage collection for the stack”. The stack, being so much more limited in size than the heap, needs it that much more. Indeed Clinger notes the tight connection between tail recursion and heap garbage collection and points out that heap garbage collection is hampered if the stack is retaining pointers to logically dead data structures. If the dead structures are large enough, garbage collection can be rendered useless. Yet many popular languages provide garbage collection but not tail recursion.

The only difference between a call and return is that typically the call is to a statically known location and the return address is dynamically passed as a "hidden" argument. But some compilers, like Siskind's Stalin compiler, statically determine the return address as well.

The only difference between a function call and a tail recursive function call is when you need to return to the caller to complete some work. In this case, the caller needs to allocate a new continuation so that control is eventually returned. If there is no further work to be done in the caller, it doesn't create a new continuation, but simply passes along the one that was passed to it.

Many compilers have been written that handle function calls, tail recursive function calls, and returns identically. They only change what code they emit for handling the continuation allocation. These compilers naturally produce tail recursive code.

Most machines provide special purpose support for a LIFO stack. It is tempting to use the stack for allocation of continuations because they are almost always allocated and deallocated in LIFO order, and a stack gives higher performance when this is the case. Many compilers do in fact use the stack for continuations and argument passing. Some, like Winklemann's Chicken compiler follow Baker's suggestion and treat the stack as an "nursery" for the heap. Others avoid using the stack altogether because of the extra complexity it entails. And others cannot use the stack because of constraints placed on stack usage by OS conventions or the underlying virtual machine model.

Sunday, January 19, 2020

Afraid of Recursion

Here's a trick I learned several years ago for transferring time-series data. In the case in question, I needed to transfer a bunch of timestamped records, but the server had a quirk to it. If you asked for too many records at once, it would simply return an error code and give up on your request. There was no way to know beforehand how many records might exist in any given time span, so you could get an error code on nearly any request, unless it was for a very short time span. On the other hand, many requests for long time spans would succeed because they had few records in them. Despite this quirk, the code was really simple:
List<record> xfer (Timestamp start, Timestamp end) {
    try {
        return tryFetchRecords(start, end);
    } catch (TooManyRecordsException e) {
        Timestamp mid = (start + end)/2;
        List<record> firstHalf = xfer (start, mid);
        List<record> secondHalf = xfer (mid, end);
        return firstHalf.addAll(secondHalf);
On any request, if the server returned the error code, we would simply bisect the time span, recursively ask for each half separately, and combine the two halves. Should the bisected time span still contain too many records, the time span would be bisected again. The recursion would continue until the time span was small enough that the server could honor the request and return some records. The recursion would then unwind, combining the returned records into larger and larger lists until we had all the records for our original time span. Since the time span would be cut in half on each recurrence, the depth of recursion would be proportional to the logarithm (base 2) of the total number of records, which would be a reasonably small number even with an enormous number of records.

It's certainly possible to avoid recursion and do this iteratively by some form of paging, but the code would be slightly more complex. The server is not very cooperative, so there is no easy way to determine an appropriate page size beforehand, and the server doesn't support a “paging token” to help keep track of progress. The recursive solution finds an appropriate transfer size by trial and error, and keeps track of progress more or less automatically. An iterative paging solution would have to do these things more explicitly and this would make the iterative code a bit more complex. And why add any complexity when it isn't really necessary?

I thought this solution was really cool when I first saw it. I've used this trick for transferring time series data many times. It makes the server very simple to write because the protocol requires so little of it. It simply has to refuse to answer requests that return too many results. The client code is just about the 10 lines above.

But when I first suggest this code to people I usually get “push back” (that is, until they see it work in action, then they usually get on board with it). People seem unsure about the use of recursion and want a more complex protocol where the client and server negotiate a page size or cooperatively pass a paging token back and forth on each request and response. Their eyes glaze over as soon as they see the recursive call. They want to avoid recursion just because it's recursive.

I've seen “aversion to recursion” happen in a lot of circumstances, not just this one. Recursion isn't the solution to everything. No tool solves all problems. But it is an important tool that often offers elegant solutions to many problems. Programmers shouldn't be afraid of using it when it is appropriate.

Saturday, January 18, 2020

Unsyndicated blog

I've noticed that my blog posts are replicated in Planet Lisp and Planet Scheme, and here I am spamming them with random math stuff. So I'm creating a new blog, Jrm's Random Blog, where I can feel free to post about math, science, computers in general, and whatever else bugs me, without spamming the Lisp and Scheme readers. I'll keep posting to Abstract Heresies, but try to keep it more Lisp and computer language focused.

Thursday, January 16, 2020

Groups, semigroups, monoids, and computers

The day after I rant about mathematicians, I make a math post. “Do I contradict myself? Very well, then, I contradict myself, I am large, I contain multitudes.” — Walt Whitman

A group is a mathematical concept. It's pretty simple. It consists of a set, G, and an operation, *, which can be used to combine any two elements of G. What the set contains is not that important. It is the * operation we're interested in, and we can usually swap out G for another set without causing too many problems other than having to change the type signature of *. There are four axioms that * must obey
  • Closure—combining any two elements of G using * just gives you another element in G.
    Note that this means you can build an arbitrary binary tree of combinations: e.g.(* (* a b) (* (* c d) e))). These trees will always be like a tree of cons cells. In some sense, the closure axiom is equivalent to saying that all the elements of G have the same type and that the * operator operates on values of that type and produces values of that type. The closure axiom along with the binary operation means that we can reduce any tree of combinations to a single value.
  • Associativity(* (* a b) c) = (* a (* b c)) for any a, b, and c. This implies that you can take any arbitrary tree of combinations: e.g.(* (* a b) (* (* c d) e))) and simply flatten it into a list (* a b c d e), or given the flat sequence (* a b c d e) we can add parenthesis anywhere we like: (* a (* b c) d e). If we stop here and only have the closure and associativity axiom, we have what is called a “semigroup”. You can use the * operation to “fold” a semigroup down to single value, or to keep an accumulator and incrementally fold elements into the accumulator.
  • Identity element—There has to be an identity element id such that (* id x) = (* x id) = x for all x. It will be unique. If you see the identity object in a combination (* a b id c d), you can simply remove it: (* a b c d). The identity element also comes in handy as an initial value when you are folding a sequence. If you have some concept that would be a group except it doesn't have an identity element, then you can often just make one up and add it to the set G.
  • Inverse element—For every element in G there has to be another element, that when combined with the first, gives you the identity. So if a is an element in G, there has to be some other element, call it b, such that (* a b) = (* b a) = id. The inverse element is usually notated with a little -1: a-1. If you have an element in a combination right next to it's inverse: (* a x x-1 c), you can combine the element and it's inverse to get the identity: (* a id c), and then remove the identity: (* a c)
Frequently you run into something that obeys all the axioms but the inverse element axiom. This is called a monoid. A monoid is very much like a group except that you can get “stuck” when manipulating it if you run into one of the non-invertible elements because there's no inverse to “undo” it. There are certain things about monoids that are true only “if the appropriate inverses exist”. You run into that qualifier a lot when dealing with monoids. You don't need that qualifier if you are dealing with a group because they do exist by axiom. Or we could say that calling something a group is simply shorthand for adding “if the appropriate inverses exist” everywhere.

What does this have to do with computers? Consider the set of all subroutines with the operation of concatenation. It is closed — concatenating two subroutines gives you a third subroutine. It is associative — you just concatenate them linearly. There is an identity element, usually called no-op. And many, but not all, subroutines have inverses. So we have a monoid.

Consider the set of all strings with the operation of concatenation. It is closed, associative, the empty string is the identity element. It is a monoid.

Consider the set of functions whose input type is the same as the result type with the operation of composition. It is closed, associative, the identity function is the identity element. It is a monoid. If we consider only the subset of functions that also have inverses, we have a group. This particular monoid or group comes in especially handy because composition of functions is so useful.

Consider the set of invertible 2x2 matrices with integer components, a determinant of 1 or -1, and the operation of matrix multiply. It is closed, associative, there is an identity matrix, and I already said just consider the invertible ones. It forms a group. This group comes in handy for implementing arbitrary precision arithmetic. (Thanks to Bradley Lucier for the correction of the condition on the determinant. This makes the matrix continue to have integer components upon inversion, keeping things closed.)

The permutations of a list form a group. The integers under addition form a group.

These things are everywhere. And it isn't a coincidence. The concepts of a group, monoid, and semigroup are meant to capture the essence of what it is to have a foldable sequence of elements. (Can I complain about mathematicians here? They make up so much terminology and abstraction that it is virtually impossible to get at what they really mean. We're just talking about sequences of elements and trying to find some minimal axioms that you need to have to fold them, but try to find literature that actually says that's what we're doing is like trying to pull hen's teeth.)

So what good are groups, monoids, and semigroups? Aside from the obvious fact that foldable sequences are ubiquitous and really useful, that is. Not immediately apparent from the axioms is that in addition to folding a sequence, you can transform a sequence into a different, but equivalent one. If the appropriate inverses exist (there's that phrase), you can “unfold” some or all elements of a sequence. So by judicious folding and unfolding, you can transform a sequence.

Here's an unusual abstract example. Consider a pipeline which has a set of nodes and communicates values of the same type between the nodes. Values accumulate at the nodes until they are transmitted to the next node in the pipeline. We start with all the values in the initial node (on the right) and transmit them to the left:
(pipeline (node) (node) (node a b c))  ;; transmit the a
(pipeline (node) (node a) (node b c))  ;; transmit the b
(pipeline (node) (node a b) (node c))  ;; transmit the a
(pipeline (node a) (node b) (node c))  ;; transmit the c
(pipeline (node a) (node b c) (node))  ;; transmit the b
(pipeline (node a b) (node c) (node))  ;; transmit the c
(pipeline (node a b c) (node) (node))  ;; done
If the values we transmit are drawn from a group, we can replace each node with the group's * operator:
(* identity identity (* a b c))  ;; transmit the a
(* identity (* identity a) (* b c))  ;; transmit the b
(* identity (* a b) (* identity c))  ;; transmit the a
(* (* identity a) (* identity  b) (* identity c))  ;; transmit the c
(* (* identity a) (* b c) identity)  ;; transmit the b
(* (* a b) (* identity c) identity)  ;; transmit the c
(* (* a b c) identity identity)  ;; done
The astute reader will notice that all we're doing is making use of the associativity axiom and moving the parenthesis around so that the values seem to move between the different nodes. But we preserve the invariant that the “value” of the entire pipeline doesn't change as the values move. The * operator need not be concatenate, which would give simple queuing behavior, but can be any operator satisfying the axioms giving us much more interesting pipelines. One implementation of arbitrary precision arithmetic transmits Möbius transformations along just such a pipeline to refine the upper and lower limits of a computed approximation. In this implementation, the * operator is the composition of Möbius transformations.

Here's a more concrete example. If you have a series of nested functions: (f (g x)) and both f and g take and return the same type, rewrite it as ((compose f g) x) and use a little group theory on it.
(f (g x))
((compose f g) x)
;; or more explicitly
((fold-left compose identity (list f g)) x)
If the appropriate inverses exist, then there will be another function h such that (compose f g) is equal to (compose h f) essentially allowing you to “slide” g to the left “through” f. It is relatively easy to see that h must be equivalent to (compose f g f-1). Mathematicians say that h is conjugate to g. Conjugates always have a form like aba-1. By finding conjugates, you can take a sequence and slide the elements left and right through other elements. This also allows you to fold things out of order. (Or in the pipeline example, transmit items out of order.) If we were left folding into an accumulator, folding h before f is equivalent to folding g after f. Another way of looking at it is this. Suppose we're standing to the left of f and looking through the “lens” of f at g. h is what g “looks like” when viewed through f.

If we want, we can define slide such that (compose slide (compose f g)) is equivalent to (compose h f). slide is (compose h f g-1 f-1). (This isn't a generic slide sequence, it only works on (compose f g). It ought to be an identity because (compose f g) is equivalent to (compose h f).) I complained that mathematicians provided too few concrete examples, so here is a concrete example using list permutations:
> (reverse (rotate-left '(a b c d)))
(a d c b)

;; rewrite as explicit fold-left of compose
> ((fold-left compose identity (list reverse rotate-left)) '(a b c d))
(a d c b)

;; sliding rotate-left through reverse turns it into rotate-right
> ((fold-left compose identity (list rotate-right reverse)) '(a b c d))
(a d c b)

;; A sequence that when composed with (list reverse rotate-left) turns it into
;; (rotate-right reverse)
> (define slide 
    (fold-left compose identity (list rotate-right reverse rotate-right reverse)))

> ((fold-left compose identity (list slide reverse rotate-left)) '(a b c d))
(a d c b)

;; rewrite back to direct procedure calls
> (rotate-right (reverse '(a b c d)))
(a d c b)

;; and slide ought to be an identity
> ((fold-left compose identity (list slide)) '(a b c d))
(a b c d)

Or suppose you have (f (g x)), but for some reason you want(g (f x)) (which would, in general, be a different value unless f and g happen to commute). Again, rewrite (f (g x)) as ((compose f g) x) and apply a little group theory. If the appropriate inverses exist, there will be a function commute-fg such that (compose commute-fg (compose f g)) is equivalent to (compose g f). With a little thought, you can see that commute-fg is equivalent to (compose g f g-1 f-1). (Again, this isn't a generic commute, it only causes this specific f and g to commute.) commute-fg is called a commutator because it makes f and g commute. Commutators always have the form aba-1b-1. By finding commutators and inserting them in the right place, you can take a sequence and swap adjacent elements. Again, a concrete example with lists:
;; an illustration of what swap-first two does
> (swap-first-two '(a b c d))
(b a c d)

;; we're given
> (reverse (swap-first-two '(a b c d)))
(d c a b)

;; but we want, for some reason to reverse first
> (swap-first-two (reverse '(a b c d)))
(c d b a)

;; rewrite as fold-left of compose
> ((fold-left compose identity (list reverse swap-first-two)) '(a b c d))
(d c a b)

;; define our commutator
;; note that swap-first-two and reverse are their own inverses
> (define commute-fg
    (fold-left compose identity (list swap-first-two reverse swap-first-two reverse)))

;; make f and g commute
;; observe that it returns the desired result
> ((fold-left compose identity (list commute-fg reverse swap-first-two)) '(a b c d))
(c d b a)

There's two interesting things here. First, notice that in both examples I convert (f (g x)) to ((fold-left compose identity (list f g)) x) and then proceed to ignore x and just consider (fold-left compose identity (list f g)) as if x didn't exist. I've abstracted away the x. (Of course I have to eventually supply the x if I want an answer, but it only comes back at the last moment.) Second, notice that although slide and commute-fg are foldable sequences, I use them as if they were higher order functions operating on the foldable sequence (compose f g) to transform it, first into (compose h f), second into (compose g f). This second thing is a neat trick. We're taking a function that operates on lists and treating it as if it were a higher-order function that operates on functions. This is called the “action” of slide and commute-fg because it appears as if elements of the set G of our group can “act” directly on other elements.

Every element in the underlying set G of a group has an action associated with it which operates directly on other elements in G. This is an important concept in group theory. Now earlier I said that the actual elements of G don't matter much, so the action must be more closely tied to the operator *. And if we swap out G for another set we'll still have the same actions, they'll just be associated with the elements of the new set (in an isomorphic way). The actions are pretty abstract.

There's a lot more one could say about the actions. They are a rich source of interesting math. My brain is getting fatigued with all this abstraction, so I'll leave the topic be for now.

If group theory is about the essence of what it means to have a foldable sequence, then category theory is about the essence of composition. They offer two somewhat different approaches to similar material. What do you do with sequences but compose them? What comes from composition but a sequence? Many concepts in group theory carry over into category theory. Naturally a completely different set of terminology is used, but the concepts are there.

But that's enough group theory for today and category theory can wait until later posts.

Wednesday, January 15, 2020

Math is hard, let's go shopping

I find mathematics, with all it's weird terminology and abstraction and equations, hard to understand. That's kind of funny coming from someone like me who makes a living from a branch of mathematics. I find computers and programming to be rather easy to understand — probably because I've had a lot of practice. But computer science is just applied logic and programming is arguably just the study of the computable functions, so you'd think math would come naturally. It doesn't.

One problem I've found is that as much as mathematicians pride themselves on rigor, they tend to be a bit sloppy and leave out important details. Computer scientists don't leave out important details because then the programs won't run. It's true that too much detail can clutter things up, but leaving out the detail and relying on “context” just increases the intellectual burden on the reader.

I will give mathematician's credit for thinking about edge cases perhaps more than a computer scientist would. It can be easy to be a bit complacent with edge cases because the computer will likely do something even if you don't think too hard about what it ought to do. But a good computer scientist tries to reduce the number of edge cases or at least make them coherent with the non-edge cases.*

Mathematicians seem to take perverse pleasure in being obscure. Computer scientists strive to be as obvious as possible because like as not, they are the ones that have to revisit the code they wrote and don't want to have to remember what they were thinking at the time. It's just easier to spell things out explicitly and obviously so that you can get back up to speed quickly when you have to debug your own stupid code. Every time I pick up some literature on category theory, I get hit with a “Wall of Terminology” denser than the “Wall of Sound” on a Phil Spector recording. It's fundamentally simple stuff, but it is dressed up in pants so fancy one has a hard time extracting the plain meaning. What seems to be universal in category theory is my difficulty in getting past page 4.

I once read a mathematical paper that talked about an algorithm with three tuning parameters: α, β, and another α. No decent computer programmer would give the same name to two different variables. Which α was which was supposed to be “obvious” from the context. The brainpower needed to keep track of the different αs was absurd and a complete waste of effort when calling the variable something else, like γ would have done the trick.

And don't ask a mathematician to write computer code. That's the one time they'll leave out all the abstraction. Instead of a nice piece of abstract, functional code, you'll get a mess of imperative code that smashes and bashes its way to a solution with no explanation of how it got there. It's a lot easier to take some abstract, functional code and figure out a more optimal way, probably imperative way to do it than it is to take a more optimal imperative piece of code and figure out the abstract, functional meaning of it.

I've found it to be extremely helpful when a computer paper includes one or two concrete examples of what it is talking about. That way, if I try to go implement code that does what the paper suggests, there's some indication that I'm on the right track. I'm more confident that I understand the paper if I have working code that produces the exact same values the paper's authors got. It's harder to find concrete examples in a math paper, and it is easier to think you know what it says but be far off base if there aren't any examples.

Maybe I shouldn't blame mathematicians so much and look a little closer to home. Perhaps I should study harder instead of demanding to be spoon fed difficult concepts. But then I read Feynman, S&ICP, S&ICM, and Jaynes and discover that maybe I just need a simple explanation that makes sense to me.

Sturgeon's Revelation is “90% of everything is crap”. This is true of both mathematical papers and computer science papers.

*An old joke illustrates the importance of thinking of edge cases: A programmer implements a bar. The test engineer goes in and orders a beer, orders zero beers, orders 999999999 beers, orders -1 beers, orders a lizard, and declares the bar ready for release. The first customer comes in and asks to use the restroom. The bar catches fire and burns down.

Tuesday, January 14, 2020

Palindromes, redux, and the Sufficiently Smart Compiler

The Sufficiently Smart Compiler is mentioned by authors as shorthand for “a compiler that performs nearly all reasonable optimizations, but in particular this one I want”. Many attempts were made up through the 80's and maybe into the 90's to write a Sufficiently Smart Compiler that would perform all “reasonable” optimizations, and although many impressive results have been obtained, there always seem to be fairly obvious optimizations that remain unoptimized. These days it seems that people realize that there will be good compilers and some very good compilers, but never a Sufficiently Smart Compiler. Nonetheless, it is worth considering a Sufficiently Smart Compiler as a tool for thought experiments.

I was curious what would be necessary for a Sufficiently Smart Compiler to generate optimal code for the palindrome problem given the naive algorithm.

The naive algorithm is inspired by the axioms
  • A zero or one element string is a palindrome.
  • If the first char matches the last char, and the middle is a palindrome, the result is a palindrome.
and gives us this:
(define (palindrome1? string)
  (or (< (string-length string) 2)
      (and (char=? (string-ref string 0)
                   (string-ref string (- (string-length string) 1)))
           (palindrome1? (substring string 1 (- (string-length string) 1))))))

The higher performing algorithm is inspired by the idea of keeping two pointers to each end of a string and comparing the characters at the pointers. If the characters are the same, you move the pointers inward and when they meet, you have seen a palindrome. If at any point the characters differ, you don't have a palindrome:
(define (palindrome2? string)
  (define (scan front-pointer rear-pointer)
    (or (>= front-pointer rear-pointer)
        (and (char=? (string-ref string front-pointer)
                     (string-ref string rear-pointer))
             (scan (+ front-pointer 1) (- rear-pointer 1))))
  (scan 0 (- (string-length string) 1)))
As you can see, these really aren't very different to start with. Both algorithms are iterative and both work their way in from the outside of the string. There are basically two differences. First, access to the rear of the string is either by a rear pointer, or by using the string-length of the string and subtracting 1. Second, the iterative call either uses substring or moves the pointers closer together.

First, let's assume that our processor has can reference through an indexed offset. This would mean we could point at the element one beyond the rear-pointer and not incur overhead. This isn't an unreasonable assumption for a CISC architecture such as an x86, but would probably cause 1 instruction overhead on a RISC architecture. So the second algorithm becomes this:
(define (palindrome2? string)
  (define (scan front-pointer rear-pointer)
    (or (< (- rear-pointer front-pointer) 2)
        (and (char=? (string-ref string front-pointer)
                     (string-ref string (- rear-pointer 1)))
             (scan (+ front-pointer 1) (- rear-pointer 1)))))
  (scan 0 (string-length string)))

Now this next assumption is a bit more of a stretch. The implementation of palindrome1? uses substring on each iteration and that's going to result in a lot of string copying. If our implementation used “slices” instead of copying the string, then there will be a lot less copying going on:
(define (palindrome1? string)
  (or (< (- (slice-end string) (slice-start string)) 2)
      (and (char=? (string-ref string (slice-start string))
                   (string-ref string (- (slice-end string) 1)))
             (substring string (+ (slice-start string) 1) (- (slice-end string) 1))))))

It is not uncommon for a compiler to introduce internal procedures for looping, so we can do that.
(define (palindrome1? string)
  (define (scan slice)
    (or (< (- (slice-end slice) (slice-start slice)) 2)
        (and (char=? (slice-ref slice (slice-start slice))
                     (slice-ref slice (- (slice-end slice) 1)))
             (scan (subslice slice (+ (slice-start slice) 1) (- (slice-end slice) 1))))))
  (scan (make-slice 0 (string-length string))))

We'll enter fantasy land again and let our compiler be smart enough to “spread” the slice data structure into the argument list of scan. This is no doubt asking too much from our compiler, but the information is available and it could in theory be done:
(define (palindrome1? string)
  (define (scan slice-start slice-end)
    (or (< (- slice-end slice-start) 2)
        (and (char=? (slice-ref string slice-start)
                     (slice-ref string (- slice-end 1)))
             (scan (+ slice-start 1) (- slice-end 1)))))
  (scan 0 (string-length string)))

And now we have palindrome2? (modulo renaming).

This doesn't really prove anything. But with a couple of somewhat unlikely compiler tricks, the naive version could be transformed to the more optimized version. It suggests that a it would be surprising but not a complete shock for an ambitious compiler writer to attempt.

I wish someone would write that Sufficiently Smart Compiler.

Monday, January 13, 2020

Cons cells vs. Linked Lists

Cons cells and linked lists are the meat and potatoes of Lisp programming. Linked lists are the primary structure that everything operates on and cons cells are the Lego blocks they are made of. For an experienced Lisp programmer, cons cells just fade into the background. You know they are there as the glue holding everything together, but it is the linked list that you keep in mind. One could construct all sorts of weird trees, dags, and graphs out of cons cells, but in general you keep things in nice linear singly-linked lists terminated with a nice, full-stop NIL.

Cons cells are nearly the perfect concrete implementation of an abstract two-tuple. They are first-class objects: you can assign them to variables, stuff them in arrays, pass and return them as values, and check them for identity. They are orthogonal to other data types; only a cons-cell returns 't to consp. They are opaque — except for the defined operations of car and cdr, you cannot access the contents of a cons cell. And while they are usually implemented as adjacent memory locations, they hide their representation and there have been many Lisps that have used unusual concrete representations of cons cells like parallel arrays of the car and cdr parts or bit codes to omit the cdr altogether through “cdr coding”. All operations on cons cells can be reduced to the basic operations cons, consp, car, cdr, (setf car), and (setf cdr). (If we had immutable cons cells, we could even get rid of the last two, but then we'd want some other means for creating circular and semi-circular structure.*)

So I find it somewhat surprising that the standard linked list implementation in Lisp is a just a terrible example of an abstract data type. This no doubt happened because linked lists got standardized well before abstract data types were really understood.

The big problem with linked lists is that instead of being orthogonal to other data types, it is a subdomain of cons-cells. The representation of a singly linked list is completely exposed: it is a cons cell, without even a wrapper object to tell you if you are dealing with the list itself or its representation. It is only by common convention that certain cons cell structures are considered to represent linked lists. And it isn't immediately clear whether the representation is meant to be a pointer to the first pair of the list, or to the entire “spine” of the list. It is often treated both ways. There is little distinction between a list primitive and a cons cell primitive, which usually doesn't get you into trouble, except in those few cases where it can cause major confusion, like when you have to handle “improper” or “dotted” lists.

Lists are mutable because their representation is mutable and not hidden. It is possible to mutate the representation such that it no longer represents a list anymore, “magically” changing any list that includes the mutated structure into something else. This means either a lot of defensive copying must be done if lists are used as arguments or passed as values, or an unenforced convention to avoid mutation of list structure must be developed in the Lisp culture. We've been pretty good at the latter, even documenting when you can and when you cannot rely on lists being mutated by library functions, but there are always a few people who go against the grain for the sake of “efficiency” (or plain orneriness) and write code that is impossible to use because you cannot easily tell what might be mutated behind your back.

With any abstract data type, there are conceptually a pair of functions that are used to transport objects across the abstraction barrier. One, call it abs->rep, takes an abstract object and exposes its representation. It is usually provided automatically by the compiler and called upon entry to the object's methods. In Java, for example, it establishes bindings for the this pointer and the private and protected fields of the object so that the method can use them. The complimentary function, call it rep->abs takes the representation of an object and hides it in an opaque, abstract version for clients of the object to use. The clients have no way to manipulate the representation of the object because they only have access to opaque, abstract version. In Java, for example, the compiler automatically does this after object construction and when the this pointer is returned properly cast to the abstract data type. The this pointer and private and protected fields of the object go out of scope and are no longer accessible.

These functions are usually provided by the compiler and often have no real implementation. The compiler simply ensures that representation comes into scope when the method is called (conceptually calling abs->rep) and that the representation goes out of scope when the method returns (conceptually calling rep->abs). No actual code is generated or exists at run time. It's easy to forget this is happening because the compiler does all the work for you. You just toggle the little bit in your head about whether you are “inside” the object or “outside” the object. If you forget, you can just examine the lexical nesting to see if the representation is in scope.

In Lisp, however, for a singly linked list, not only are these functions omitted, they are completely fictitious. It is only in the programmers head that what was once considered a linked list is now to be considered a pointer to head cell of list (abs->rep) and only probably in the programmers head that the reverse (rep->abs) is happening on the way out. It doesn't matter much if he or she forgets this because the written code is the same either way. It only matters if he or she somewhere down the line uses a cons-cell operation where a list operation is actually what should be used. This can lead to common rookie mistakes like
  • Using cons where list is wanted, yielding (1 . 2) where (1 2) is desired. (The “unwanted dot” problem.)
  • Using list where cons is wanted, yielding (1 (2)) where (1 2) is desired. (The “too many parenthesis” problem.)
  • Confusion about whether ((1 2) 3 4) is meant to be a three-tuple of a list and two integers, or a two-tuple of two lists. (It's both, depending on the unwritten intent of the programmer.)
  • Using cons or list where append is wanted, yielding ((1 2) 3 4) or ((1 2) (3 4)) when (1 2 3 4) is desired. (Again, “too many parenthesis”.)
  • Use of (append ... (list <element>)) to “cons” to the “right end” of a list, leading to O(n2) algorithms rather than O(n).
Now don't get me wrong. I like Lisp and I like linked lists. And I'm not suggesting we avoid using them in favor of some other well-designed abstract data type. I just think they're an awful example of how to implement an abstract data type and perhaps that's why it is difficult for beginners to learn how to use them properly. It might also be worthwhile to implement a Lisp with proper (and immutable) abstract linked lists. It wouldn't make much difference to experienced programmers who are already used to applying the representation/abstraction interface in their heads, but it might make it easier for novices to manipulate linked list and cons cells (and keep them apart).

If you want to be completely contrary, consider Olin Shiver's suggestion: all objects — cons cells, strings, integers, null, etc. — are lists. It's just that every object other than a cons cell is a zero element dotted list. Now rather than being a subtype of cons cells, lists become a supertype of all objects. This viewpoint can probably be made coherent, but it does raise a lot of questions. Here are some that come to mind:
  • Is (length '(1 2 . 3)) the same as (length '(1 2 3))? If not, what is (length '(1 2 . 3))
  • Should lists retain their “dottedness” when passed through functions like memq or map? What is (memq 2 '(1 2 . 3))? What about (memq 3 '(1 2 . 3))?
  • What is (reverse '(1 2 . 3))? Is (compose reverse reverse) an identity?
This was extensively discussed on the SRFI-1 mailing list, so I won't rehash the discussion here. The questions I raised above, and many more, were raised and discussed. Eventually, it was decided that continuing to be backwards compatible was an important consideration. (Personally, I think the notion plays havoc with the group theoretic properties of lists, and that is enough to make it suspect.)

There is a good argument that “dotted” lists are rarely used and almost always a mistake, but they are built in to the grammar of Scheme as an indicator of “rest” arguments, so getting rid of them would require some other way to specify “rest” arguments. Racket takes things further by allowing doubly dotted lists to indicate infix notation: (a . < . b)

Just for kicks, I took things in the other direction and wrote some C# code that implements singly-linked lists as their own abstract data type using special, immutable cons cells that require that their CDR be either an existing singly-linked list or the empty list. “Dotted” lists are not a problem because you simply cannot construct one. The representation of a list is explicitly coded as a pointer to the head cons cell of the list. The code illustrates how the abstract list is turned into a the pointer to the cons cell when it is carried across the abstraction barrier and how it is turned back into an abstract list when carried back out. Again, I'm not suggesting anyone use the code, or take it as a serious proposal. (For one thing, it doesn't address what to do about circular lists, or the dotted lists in the Scheme grammar.) It was just a fun hack for illustrative purposes. It is available here for those interested.

*Many years back, Henry Baker said “C'mon, cons cells should just be immutable.” (if I am remembering the exact quote correctly). I agree with his sentiment. Combine immutable cons cells with “hash consing” and the appropriate equality primitives and you get directed acyclic graphs (and their space properties) “for free”. We'd either have to do without circular structure or use another means to achieve it. Since circular structure often leads to divergent programs I wouldn't consider it a great loss, but some may disagree. Perhaps they might be assuaged by a nice set of primitive procedures for creating and manipulating circular cons cell structure.

Sunday, January 12, 2020

Just for fun, transformations on lists

The mathematician in me likes to think about what happens to data and programs when you apply certain transformations to them. Here's a simple example. Imagine the function swap that simply makes a new cons cell by swapping the car and cdr of an existing cons cell:
(define (swap cell) (cons (cdr cell) (car cell)))

> (swap '(1 . 2))
(2 . 1)

> (swap (swap '(1 . 2)))
(1 . 2)
As we'd expect, two swaps are equivalent to no swaps at all. Indeed any even number of swaps are equivalent. Any odd number of swaps is equivalent to one swap.

What if we call swap on a list?
> (swap '(1 2 3))
((2 3) . 1)
That's odd looking. But swapping again returns it to normal
> (swap (swap '(1 2 3)))
(1 2 3)

But swap only swaps the top-level cell. Let's define deep-swap that descends into the car and cdr if possible:
(define (deep-swap cell)
  (cons (if (pair? (cdr cell)) (deep-swap (cdr cell)) (cdr cell))
        (if (pair? (car cell)) (deep-swap (car cell)) (car cell))))

> (deep-swap '((1 . 2) . (3 . 4)))
((4 . 3) 2 . 1)
Wait, what? Oh, the list printer is just interpreting the second cons cell as a part of a top-level list. We can see this by trying this:
> '((4 . 3) . (2 . 1))
((4 . 3) 2 . 1)
So we just have to be aware of list printer eliding the dots for us.

What if we call deep-swap on a list?
> (deep-swap '(1 2 3 4))
((((() . 4) . 3) . 2) . 1)
Fortunately, deep-swap, like swap, undoes itself.
> (deep-swap (deep-swap '(1 2 3 4)))
(1 2 3 4)
It's easy to see that swap and deep-swap should commute.
> (equal? (swap (deep-swap '((a . b) . (c . d))))
          (deep-swap (swap '((a . b) . (c . d)))))
Alternatively, define compose
;; Simple composition of two functions
(define (compose2 outer inner)
  (lambda (x) (outer (inner x))))

;; Composition of arbitrary number of functions
(define (compose f . fs)
  (if (null? fs)
      (compose2 f (apply compose fs))))

> (equal? ((compose swap deep-swap) '((a . b) . (c . d)))
          ((compose deep-swap swap) '((a . b) . (c . d))))
So you can just move all the swaps together and all the deep-swaps together, then remove pairs of each one.

swap and deep-swap don't have very complex behavior, so let's turn to lists. We can define rotate-left as follows:
(define (rotate-left l) (append (cdr l) (list (car l))))

> (rotate-left '(1 2 3 4))
(2 3 4 1)

> (rotate-left (rotate-left '(1 2 3 4)))
(3 4 1 2)
(This is horribly inefficient, so this is just for fun, not production code.) Now what happens when we combine rotate-left with reverse?
> (reverse (rotate-left (reverse '(1 2 3 4))))
(4 1 2 3)

(define rotate-right (compose reverse rotate-left reverse))

> (rotate-right '(1 2 3 4))
(4 1 2 3)
rotate-left becomes rotate-right when used “under” reverse. Of course rotate-left doesn't commute with reverse: (reverse (reverse (rotate-left '(1 2 3 4)))) the reverses cancel each other and we're left with a rotate-left.

We can define “deep” versions of reverse, rotate-left, and rotate-right:
(define (deeply f)
  (lambda (l)
    (if (list? l)
        (f (map (deeply f) l))

> ((deeply reverse) '((1 2 3) 4 5 (6 7 (8 9 10))))
(((10 9 8) 7 6) 5 4 (3 2 1))

> ((deeply rotate-left) '((1 2 3) 4 5 (6 7 (8 9 10))))
(4 5 (7 (9 10 8) 6) (2 3 1))

> ((deeply rotate-right) '((1 2 3) 4 5 (6 7 (8 9 10))))
(((10 8 9) 6 7) (3 1 2) 4 5)
Naturally, a (deeply rotate-left) will undo a (deeply rotate-right). You might suspect that the composition of (deeply reverse), (deeply rotate-left), and (deeply reverse) is equivalent to (deeply rotate-right), and you'd be right (I suspected as much, too, but it didn't seem so obvious, so I checked).

Notice that the deeper list structure has 3 elements each, but the topmost list structure has 4 elements, so 3 deep rotations is equivalent to one shallow rotation in the opposite direction, or (compose rotate-left (deeply rotate-left) (deeply rotate-left) (deeply rotate-left)) is an identity. In fact, the shallow rotate-left commutes freely with (compose (deeply-rotate left) (deeply rotate-left) (deeply rotate-left))
;; These are all equivalent identities
(compose rotate-left (deeply rotate-left) (deeply rotate-left) (deeply rotate-left))
(compose (deeply rotate-left) rotate-left (deeply rotate-left) (deeply rotate-left))
(compose (deeply rotate-left) (deeply rotate-left) rotate-left (deeply rotate-left))
(compose (deeply rotate-left) (deeply rotate-left) (deeply rotate-left) rotate-left)

Arbitrary application of these operators will scramble your list structure much like arbitrary rotations will scramble a Rubik's cube. The analogy is more than skin deep: group theory can be used to describe and analyze the combinatorics of both. Group theory tells us that operations of the form F-1GF are likely to be interesting. And indeed:
> ((compose rotate-right reverse rotate-left) '((1 2 3) 4 5 (6 7 (8 9 10))))
(4 (1 2 3) (6 7 (8 9 10)) 5)

(define involute (compose rotate-right reverse rotate-left))
swaps the outside elements with the inside ones. And if we compose a rotate-left with this, we find we've reversed only the last three elements in the list ((1 2 3) (6 7 (8 9 10)) 5 4).

Just using these operators, there seems to be no way to get to get to '((1 2 3) 5 4 (6 7 (8 9 10))) (at least I couldn't find one), so I defined another operator:
(define (call-on-tail f)
  (lambda (x)
    (cons (car x) (f (cdr x)))))
which leaves the head element alone while applying the transformation to the rest.
> ((compose rotate-left reverse (call-on-tail rotate-right) involute)
   '((1 2 3) 4 5 (6 7 (8 9 10))))
((1 2 3) 5 4 (6 7 (8 9 10)))

These functions can move elements up and down the list structure:
(define (leftmost c)
  (if (pair? c)
      (leftmost (car c))

(define (replace-leftmost c new-value)
  (if (pair? c)
      (cons (replace-leftmost (car c) new-value) (cdr c))

(define (rightmost c)
  (if (pair? c)
      (if (null? (cdr c))
          (rightmost (car c))
          (rightmost (cdr c)))

(define (replace-rightmost c new-value)
  (if (pair? c)
      (if (null? (cdr c))
          (cons (replace-rightmost (car c) new-value) (cdr c))
          (cons (car c) (replace-rightmost (cdr c) new-value)))

(define (swap-ends l)
  (replace-leftmost (replace-rightmost l (leftmost l)) (rightmost l)))

> (swap-ends '((1 2 3) 4 5 (6 7 (8 9 10))))
((10 2 3) 4 5 (6 7 (8 9 1)))

> ((compose involute swap-ends involute) '((1 2 3) 4 5 (6 7 (8 9 10))))
((1 2 3) 5 4 (6 7 (8 9 10)))

> ((deeply swap-ends) '((1 2 3) 4 5 (6 7 (8 9 10))))
((6 2 1) 4 5 (8 7 (10 9 3)))

> ((compose (deeply swap-ends)
            (deeply swap-ends)
            (deeply swap-ends)
            (deeply swap-ends)
            (deeply swap-ends)) '((1 2 3) 4 5 (6 7 (8 9 10))))
((1 2 3) 4 5 (6 7 (8 9 10)))

There's no real application for all this, except maybe to come up with some puzzles. It's just fun to noodle around with list transformations to see what you can come up with, and to practice your list manipulation skills. You really need a language with a REPL to play around like this. A parsimonious syntax like Lisp helps, too. It would have been a bit more difficult to fool around if I had to put the appropriate commas, curly braces, brackets, and semicolons in just right.

None of these operations work on circular lists, but you can imagine that rotations and reversals could work on fully circular lists, but I'm not sure how they'd make sense on lists with circular tails. It would be challenging to make them work, though. They also don't work on “dotted” lists — they throw an error when they run into the dotted item at the end of the list. But it is fairly easy to imagine how they might be made to work on a dotted list. It would be much less of a challenge to implement.

Saturday, January 11, 2020

Gendl / SBCL / Ubuntu / WSL / Windows and an experiment in live blogging

I'm helping David Cooper by trying to run a demo of his Gendl software. He's preparing a new release and I get to be the guinea pig. For fun, I'm live blogging as we go along just as an experiment.

I'm running SBCL 1.4.5debian under Ubuntu 18.04.3 under Windows Subsystem for Linux (WSL 1) under Windows 10 Home edition. I have to say that Ubuntu on WSL is remarkably stable and useful. It seems to act just like the Ubuntu I'm used to using and runs ELF executables without modification. The GNU tool chain just seems to work and I used apt-get install to fetch and install SBCL, which hasn't complained either. I've git cloned David's release and I'm now awaiting further instruction.

While waiting, I've installed slime and am running SBCL under Emacs 25.2.2. Quicklisp is used to install and start Gendl. This starts up a web server that provides the UI in another thread.

So far, this entire melange of software is working quite smoothly despite the oddball combination of the parts. Lisp has habit of tickling memory protection bugs and threading bugs. Unix isn't supposed to get along with Windows. Windows isn't known to get along with Unix very well, either, unless you isolate them from each other through a virtual machine. But WSL is doing its job acting as a go-between. I don't know the details of how it works, but I do know that you need to have some virtualization enabled, but it isn't a full virtual machine (Windows 10 Home edition doesn't support Hyper-V). In WSL, the Linux side can see the Windows disk as a mount point, but it doesn't seem that the Windows side can see the Linux disk. WSL gives you a bash shell in a window capable of running Emacs, or you can run an XWindows server like Xming under Windows if you want the full X experience. Performance seems reasonably snappy.

Well live blogging didn't work. It just felt rude to be typing and not paying full attention while someone was demonstrating some software that he had obviously worked very hard on. So I'll give a re-cap of what I understood from the demo. I'm no expert on CAD systems and most likely misunderstood important points, so take this as a novice's view. I asked David to help me correct this write-up. Not surprisingly, there's one important point I misunderstood, so I'll put in David's explanation.

Gendl is inspired by ICAD. Through use of CLOS and some macros, Gendl provides a DSL that allows you to design an object through a tree-like decomposition of its component pieces. Each component has properties, and these can be inherited by subcomponents (so, for example, when you paint a component a certain color, the color propagates down the component tree to the child components and they get painted, too).

(Here I messed up majorly.) In David's words
The technical term for the properties is “messages.” In many ways this is a message-passing object system, inspired by Flavors, which was inspired by SmallTalk (The ICAD System was built with Flavors, not CLOS).

Note there are two, orthogonal, concepts of "inheriting" going on. Normal class mixins provide one type of inheritance -- an object definition will inherit all the messages of its mixins. We usually call this "inheritance."

The passing of values from parent instance to child instance and other descendant instances in the instance tree is called “passing down” rather than inheritance, to avoid confusion with that other, different, inheritance. Messages can be passed down implicitly (by making them trickle-down-slots), or passed down explicitly by including them as keyword/value pairs in the object specification.

Another way to think of this is that the class (or “type”) of a given instance can inherit from multiple mixins, can contain multiple child object instances, but can have at most one Parent object in the instance tree (and it can have zero Parent objects if it's the root)


The mixin inheritance relationship is an “is a” relationship (essentially the same as normal CLOS inheritance).

The parent-child relationship is a “has a” relationship, and comes along with automatic demand-driven (i.e. lazy) dependency tracking.

The base component contains a co-ordinate system, so all components have a co-ordinate system relative to the root object. One primary module or application of Gendl is the web-based UI Geysr, that allows you to navigate the component tree, inspect components, change their properties, and visualize how they fit together. This module also provides an “exploded” view of the object if desired, where each subcomponent is rendered displaced a small distance from it's actual location. It can also render an exploded view of the assembly processs for the object.

The DSL provides a way of specifying a components location via constraint-like functional messages between components. This means that components can get resized when necessary, angles recomputed when necessary, and even the number of subassemblies recomputed dynamically when the object changes. In the “bench” example David showed me, the number of slats in the bench seat was recomputed dynamically from the seat width, the slat width, and constraints on the slat spacing. The user simply specified the perimeter of the seating area and Gendl figured out how many 2x4's you'd need to cover it. (Maybe I'm easily impressed, but I thought this was pretty neat.)

Another part of Gendl is the Process Planning module which computes a Manufacturing Bill of Processes and Raw Materials. This is much more interesting. Again using the DSL provided by CLOS, you define rules on how to build components from their constituent parts. Then Gendl, starting at the root node, infers a construction plan by recursive descent through the components and combining the rules. When it is done, you have the instructions for building the entire assembly. For the bench example, it started with purchase of sufficient 2x4s for the frame, seat, and back, then cutting the 2x4s to length, some at an angle for the reclining back, then fastening the cut pieces together, finally painting the result. The leaf nodes in the process planning tree represent the Bill of Materials for the object under construction. I thought this was pretty cool, too.

While I know nothing about CAD, I do know a little about computer languages, so I'll opine a little on the DSL. The basic operation in Gendl is define-object which is a macro that extends defclass. Objects have slots (fields) that hold values. One feature of Gendl is the ability to define “trickle down” slots whose values are available in the environment of subcomponents. Thus when you attach a new seat slat to the seat of the bench, it can use the trickle down value to match its paint color with that of the rest of the seat. This use of “environmental” properties isn't unique to Gendl, but it is worth note. David says, “Trickle-down slots are essentially a short-hand way of passing messages from a parent down to its descendants.

The recursive nature of Lisp matches well with the recursive decomposition of objects. The DSL makes it obvious to even the casual observer how the objects are composed. The value of the slots in the object can be defined as any Lisp expression, but drawing from the constraint-like language subset makes it obvious how the pieces relate to each other. You don't specify that the arm rests are 60 inches from the center (although you could), you specify that they butt up against the backrest. Given that constraint, you can move one subassembly and Gendl will automatically recompute the positions of the adjoining assemblies. While this is powerful, I suspect you need a fair amount of experience in setting up the constraints to do it so that it is useful.

Something I completely missed because it is natural to me and other Lisp hackers, but not to CAD designers, is the power of having an Emacs/Slime REPL in parallel with the web-based interface. The web-based interface gives you a natural point-and-click, drag-and-drop, UI with the instant visual feedback while at the same time you have the powerful DSL available in a REPL for experimenting directly with the object tree in a natural text-based way. (Sometimes people forget that text itself is highly abstract visual medium with thousands of years of development behind it.) In David's demo, he would move between the REPL and the UI extremely frequently, keeping both windows open at the same time, sometimes manipulating the object graphically, other times textually. In retrospect, this looks like a huge advantage, but at the time it seemed like an obvious way to use the system. I expect other jaded command-line hackers would have the same experience.

The rules for constructing components are written as CLOS methods that are specialized to the component they are constructing. It is rather obvious what rule applies for a given component. In addition it is obvious what to specialize on for constructing a component. The rule files are straightforward and reasonably terse.

Given the shortness of the demo (there was a lot to grasp), and my own inexperience, I don't think I could write an object description from scratch, but given an existing description I feel confident I could add a small subcomponent. I can see how this product would be useful in design and manufacturing and I think the learning curve doesn't look too steep if one is willing to start small before working up to something hugely complex.

Thanks to David for giving me the demo and for helping correct my blog post. I take credit for all the errors and misconceptions.

Thursday, January 9, 2020


All the cool kids are doing micro-services these days. They are either writing their new apps as a micro-service architecture, or are in the process of changing their legacy apps to micro-service architectures. The fact that everyone seems to be doing it surely means there are no drawbacks of any note, or that the advantages vastly outweigh them.

I see these advantages to a micro-service architecture:
  • Architectural impedance match — each architectural feature maps to a few obvious micro-services.  For example, if you are adding a feature that simply surfaces a data field to a user interface, you probably need to modify little more than the component that generates the UI.  If you are adding a richer service, you'll need to add a persistence component and a business logic component in addition to the UI component. But the reason there is such a good match between the architecture and the micro-services is simply because you draw circles around each architectural component in your design and declare it to be a micro-service. Basically, if it is worth drawing as a separate component, you consider it worth implementing it as a micro-service.

    This impedance match works well with Agile/SCRUM planning. At the beginning of each sprint, it is fairly easy to assign resources to handle the changes necessary to the existing components or to write new components. Each service is usually tiny enough that it doesn't exceed the capacity of a single team. If you can assign more than one team to a component, then the component is too large and needs to be broken into more micro-services. For a more complex feature, it is possible to plan to bring the micro-service online over a period of more than one sprint without adding too much to the risk of the component.
  • Deployment impedance match — each architectural feature maps to only a handful of obvious micro-services, each one usually implemented by a stand-alone "plug-in" program.  The micro-services are usually run in a separate container where they can crash and be restarted with little ill effect on the program at large - perhaps some functionality is temporarily lost while the micro-service is restarted, or maybe a hot backup is ready to take over.  The containers are typically something like "docker" and are monitored with "kubernetes" to ensure they are up and running.  Structuring the program as a set of micro-services facilitates this kind of of monitoring and restarting. This works well when you can distribute your micro-services over a fleet of virtual and physical machines.
  • Possible bug reduction — From experience, it seems that two 500 line programs have fewer bugs than a single 1000 line program. In theory, this would scale and a program made of dozens of hundred-line subprograms would have far fewer bugs than if that same program were written as a single subprogram with several thousands of lines of code. In practice, however, I think that it isn't so simple.
    Two bug-free programs might have emergent buggy behavior when they try to co-ordinate their behavior. Emergent bugs are very hard to find and fix.
  • Robustness — it is hard to completely knock down a micro-service architecture. No matter how many subprocesses you manage to disable, at least a few will still remain running. If we are satisfied with occasional missing pieces of our program, this is robustness of a sort.
  • Dependency injection — Back in the 80's, we'd limit the spread of complexity by structuring larger programs to only know about those smaller subprograms that they depended directly upon. This was eventually given a name and declared “a good thing”. You get dependency injection almost for free in a micro-services architecture if not least because the services won't even know about each other unless you tell them. You are virtually forced to follow a reasonable design principle because you have to do something to get the services to know about each other, and this minimal effort also happens to minimize unwanted information sharing.

It is a time-honored tradition in computer programming to take any idea that offers some advantage, give it a name, elevate it to a “principle” and use it to the exclusion of any prior good idea. “Micro-service architecture” is the latest fad, and it has so rapidly replaced “object-oriented programming” that you can often find both in the same job description. This is almost a contradiction because a micro-service architecture often cuts straight through the objects manipulated by the program. What is conceptually a single object might be represented in several completely different ways in various micro-services in the system. It might be reconstituted from a relational mapping by way of an ORM, serialized into JSON format for RPCs, and returned as part of a nested GraphQL query before having its guts evacuated and spread out through the Document Object Model for display. The identity of the object through the system is completely lost.

Despite the plethora of wonderful ideas that come from a micro-service architecture, there seem to be some downsides.
  • Ubiquitous process barriers — A process barrier is thrown up between every functional component of a program. This helps limit the effect of a crash, but greatly hinders cooperation and synchronization. Error handling and try/finally clauses don't easily work across process boundaries, and what before might be handled by a critical section of code now might require a distributed locking protocol.
  • Ubiquitous RPCs — all communication between the micro-services have to take place between the tin-cans and strings we call remote procedure calls. RPCs, besides being slow, introduce failure modes that normal procedure calls don't have. And if the programmer doesn't think hard about these new failure modes, the RPC will be far less robust than its direct counterpart.
  • Anti-object oriented — all objects in the problem domain of a micro-service architecture must be simple enough to marshal into JSON objects so they can be passed as arguments and returned as values. As noted in a previous essay, the more complex the object, the harderit is to marshal across a procedure boundary, and if fields are hard to marshal, methods are virtually impossible. There is a strong incentive to keep "objects" in JSON format as they are passed through the system and only parse the JSON strings as they are needed by the lowest layer. This is the antithesis of object-oriented design which is to try to make objects as similar to their real-world counterparts as possible and encapsulate behavior (methods) along with the data.
  • Non robustness — while it is hard to knock over an entire micro-architecture application, it can be all too easy to knock over parts of it. A program that is 90% functional 100% of the time is also 10% broken 100% of the time. If that 10% is an often used part of the program, it can appear very non robust. So what if the header and footer of each page (each its own micro-service, naturally) is up nearly 100% of the time when the body of each page is missing half its data. A colleague of mine noted that a certain micro-architecture he was working on seemed very fragile for something that is supposed to be so robust.
  • Verbosity — the frameworks used to support a micro-service architecture makes plain old Java look downright terse. Objects and methods have dozens of annotations describing how to marshal the objects and hook up the RPCs. Some frameworks make use of the redundancy of Java to aid in automatically constructing a micro-architecture system, but these often rely on reflection and automated code generation that introduces its own set of problems. And Java's habit of putting every class in a separate file means you often have to follow a tortuous trail through the code to find the path from the request coming in to the method that actually handles the logic that implements the request.

Despite these drawbacks, I expect to see a lot more micro-architectures in the future because they match up so well with “factory style” software production. It really makes it easier to manage teams of developers. I expect enthusiasm to be curbed a bit when it is noticed that rewriting a program as a set of micro-services doesn't really result in a dramatic improvement in the reliability of the program or a dramatic reduction in the complexity of a large body of code.

Wednesday, January 8, 2020

If 6 was 9

Now if 6 turned out to be 9
I don't mind, I don't mind
Alright, if all the hippies cut off all their hair
I don't care, I don't care
Dig, 'cos I got my own world to live through
And I ain't gonna copy you           -- Jimi Hendrix 
Fortunately, integers are immutable and usually implemented that way (with a few notable exceptions, mostly on early computers), so there would be no harm in Jimi copying 6 around whenever he felt like it. It would always remain 6 and never turn out to unexpectedly be 9. After all, in a computer there are hundreds of thousands of copies of the integer 1 and they agree because they are all the same and are all immutable.

There is no need for “defensive copies” of immutable data, either. This can improve the Ospace() of a program, sometimes dramatically. It also makes it much easier to encapsulate the representation of abstract data, so object-oriented code becomes easier to write correctly. Without mutators, there's half the code for each field in an object.

Cache coherency is trivial when your data is immutable: simply eject old data. No need for write-back, write-through, or cache snooping.

“Immutable by default” still hasn't taken over the mainstream, but “optionally immutable” has gained a lot of popularity over the past decade, especially as people have realized the advantages of immutable data in distributed systems.

Jimi probably wasn't thinking along these lines when he wrote that song, though.

Tuesday, January 7, 2020

Remotely like a procedure call

The moment you connect two computers together with a wire, you introduce a number of failure modes that are very hard to deal with: undeliverable messages, fragmentation of messages, duplication of messages, lost messages, out-of-order arrival, and untimely arrival come to mind. There are probably some I am forgetting. With a bit of engineering, you can eliminate some of these failure modes, or reduce the likelihood of them occurring, or trade the likelihood of one kind for another, but ultimately, when you send a network message, you cannot be sure it will reach its destination in a reasonable amount of time, if at all.

Usually, in a network, the computer that initiates the interaction is called the client, while the computer that performs the action is called the server. There are usually several computers along the way — routers, proxies, and switches — that facilitate the interaction, but their job is to act as a wire in the non erroneous case, return responses if needed, and to report errors back to the client if necessary and possible.

There are several taxonomies of network errors, but one useful way I have found to categorize them is by what sort of actions you might want to take to deal with them. If you are lucky, you don't have to deal with them at all. For instance, it might be reasonable to simply lose the occasional remote info logging message. Or if a web page fails to be delivered, it might be reasonable to rely on the recipient eventually hitting the reload button. This isn't much of a strategy, but it is easy to implement.

Many times, it is possible for the client to detect that the interaction failed. Perhaps the server returns an error code indicating the service is temporarily unavailable, or perhaps a router or switch along the way indicates that it cannot deliver a packet. This isn't the desired state of affairs, but at least the client can come to the conclusion that the interaction didn't take place. Depending on what is causing the failure, there are essentially only two options to proceeding: trying again and giving up. If the error is persistent or long-term, trying again is futile and your only option is to defer the operation indefinitely or abandon it altogether. A robust, distributed system has to be prepared to abandon interactions at any time and have a strategy for what do to recover from abandoned interactions. (For example, it could pop up a message box indicating a service is not available, or it could send an email saying that it will process the interaction at some undetermined time in the future. I suppose crashing is a strategy, but it stretches the definition of “robust”.)

Trying again, usually after a short wait, is often an appropriate strategy to deal with transient errors. For instance, an unreachable server may become reachable again after a routing change, or a service could get restarted and start serving again. It may not be possible to tell the difference between a transient error and a persistent one until you have retried it several times and have had no success. In this case you need to fall back to the persistent error strategy of deferring the operation indefinitely or abandoning it altogether. If retrying is part of strategy of handling transient errors, it becomes necessary for the server to deal with the possibility of handling the same message multiple times. It should, through use of timestamps, versioning, or other idempotent mechanisms, be able to ignore duplicate requests silently. Otherwise, the original error is handled through a retry only to cause the server to raise a duplicate request error.

An even less desired failure mode is when a message is lost or cannot be delivered, and this cannot be determined by the client. The message gets sent, but no reply is received. You are now left guessing whether it was the original message that was lost or the just the reply. The usual strategy is to eventually time-out and resend the message (essentially, you inject a transient failure after a pre-determined amount of time and then handle it like any other transient failure). Again, the server has to be designed to see a message a multiple number of times and to handle duplicates silently and gracefully. Also again, the situation may become persistent and the persistent fallback strategy has to be used.

No one size fits all, so network systems come with libraries for implementing various strategies for error recovery. A robust, distributed system will have been designed with certain failure modes in mind and have an engineered solution for running despite these errors occurring once in while. These involve designing APIs that can handle stale data through versioning timestamps and a reconciliation process, idempotent APIs, temporarily saving state locally, designating “ultimate sources of truth” for information, and designing protocols that reach “eventual consistency” given enough time. There's no panacea, though — no programming technique or trick that simply results in a robust, distributed system. A robust, distributed system has to be designed to be robust despite being distributed and you need some experienced people involved in the design and architecture of such a system at the beginning so that it will be robust once it is implemented. There is actual non-trivial work involved.

(It amuses me to see that some people understand part of the problem without understanding much about the underlying tools. I've seen packet timeout and retry mechanism bolted on to TCP streams, which already have this built in to their abstraction.)

Enter the remote procedure call. If nothing goes wrong, a network interaction that involves a return receipt bears a remote resemblance to a procedure call. The syntax can be made similar to a procedure call, it's just that the call begins over here, the computation happens over there, and the return value is sent back over here. The major difference being that it is significantly slower (and just forget about tail recursion. Although there's nothing in theory that would prevent a properly tail-recursive RPC, I've never seen one.)

RPCs are a weak, leaky abstraction. If everything goes well, the network interaction does indeed appear as if it were a (slow) procedure call, but if there are problems in the network interaction, they aren't hidden from the caller. You can encounter persistent errors, in which case the caller has to be prepared to either defer the call indefinitely or abandon it altogether. You can encounter transient errors, which suggests attempting to retry the call, but it may not be clear whether an error is transient or persistent until you've retried and failed several times. If retrying is part of the error recover strategy, then the callee has to be prepared to receive duplicate requests and discard them or handle them in some other (presumably idempotent) manner. In short, none of the errors that arise from network interactions are abstracted away by an RPC. Furthermore, network errors are often turned into RPC exceptions. This seems natural but it makes it difficult to separate the strategies for handling network error from the strategies of handling other exceptions.

A featureful RPC mechanism can aid in figuring out what happened in an error case, and can provide various recovery mechanisms, but it is still up to the programmer to determine what the appropriate recovery strategy is for each error condition.

RPCs are by their nature a late-bound mechanism. There has to be machinery in place to resolve the server that will handle the network interaction, route messages, and possibly proxy objects. All too frequently, this machinery simply doesn't work because sysadmins thoughtlessly apply over broad firewall rules that prevent the appropriate name resolution and proxying from occurring. To get around this problem some systems have resorted to using HTTP or HTTPS as the underlying RPC mechanism. These protocols are rarely blocked by firewalls as it would render it impossible for sysadmins to watch videos.

HTTP and HTTPS were designed as protocols for transferring web pages, not for general remote procedure calls. For example, errors are handled by returning a 3-digit code and small blurb of text. Shoehorning a featureful RPC mechanism into web interactions is difficult, so we are left with shuffling JSON objects over HTTP connections as our poor-man's approximation to a well-designed RPC mechanism.

RPCs can be a useful, if limited abstraction. If you are using them, you need to be careful because they don't abstract away the problems of distributed systems and it becomes all to easy to build a fragile distributed system rather than a robust one.

Addendum: It seems that I'm not the only one having problems commenting on things. Arthur Gleckler tried to write a comment but it simply got eaten and returned him to the posting page. I don't know what John Cowan is doing right, but he appears to be able to post comments just fine. Google has this unfortunate habit of letting bit rot set into their products. Then, once enough of it stops working, they kill the product or try to replace it with something no one wants. Anyway, here's what Arthur had to say:
Speaking of tail-recursive RPCs, when Google was redesigning its RPC system, I tried to get the new designers to implement a simple tail-recursive capability so that client A could call server B and receive a response from server C. That could have made a lot of common patterns at Google faster, since it was common for A to call B, which called C, which called D, and so on, e.g. to try different mechanisms for handling a particular search query. Unfortunately, I wasn't persuasive. Frankly, it's hard for people who haven't spent much time with tail-recursive languages to see the benefit, and these were hard-core C++ hackers.