Wednesday, February 19, 2020

Stupid pattern matching tricks

There are a few pattern matching constructs in Common Lisp. For instance, destructuring-bind matches list structure against a tree of variable names and binds the variables accordingly. Macros can destructure their argument list. Even functions have simple keyword matching. These constructs don't give access to their pattern matchers as first-class objects, but perhaps you want that. You can construct a simple pattern matcher by wrapping one of these constructs in the appropriate macro.

We'll want the result of our pattern match to be an alist mapping symbols to the objects they matched with. First, we'll need a function that takes a pattern and returns a list of the variables in the pattern. flatten will work nicely for destructuring-bind:
(defun flatten (pattern)
  (cond ((null pattern) '())
 ((symbolp pattern) (list pattern))
 ((consp pattern) (append (flatten (car pattern))
     (flatten (cdr pattern))))
 (t (error "Not a pattern"))))

CL-USER> (flatten '((a b . c) d e . f))
(A B C D E F)
Then we want to generate code that will make an alist:
CL-USER> `(list ,@(map 'list (lambda (var)
           `(cons ',var ,var))
               (flatten '((a b . c) d e . f))))
Finally, we wrap a call to destructuring-bind with a macro:
CL-USER> (defmacro destructuring-pattern-matcher (pattern)
           `(lambda (form)
              (destructuring-bind ,pattern form
                (list ,@(map 'list (lambda (var)
                              `(cons ',var ,var))
                     (flatten pattern))))))

CL-USER> (destructuring-pattern-matcher ((a b . c) d e . f))
#<FUNCTION (LAMBDA (FORM)) {10027B143B}>
destructuring-pattern-matcher returns a pattern matcher as a first-class procedure we can call on a pattern to get an alist of bindings:
CL-USER> (defvar *matcher* (destructuring-pattern-matcher ((a b . c) d e . f)))

CL-USER> (funcall *matcher* '((1 2 3 4) 5 6 7 8))
((A . 1) (B . 2) (C 3 4) (D . 5) (E . 6) (F 7 8))

We can use this trick to get at the destructuring pattern match done by defmacro. First, we need a function that takes a macro lambda list and returns a list of the variables it binds. I won't reproduce the function here, it is too large, but here's a sample call:
CL-USER> (macro-lambda-list-variables 
            '((foo bar &optional (baz 'default baz-supplied-p) . more) quux
              &rest rest
              &key ((:key key-variable) 'key-default key-supplied-p) key2
              &aux (auxvar 'auxvalue)))
If we were matching the list '(1 e) against the pattern (a b &optional c), we'd want to generate code something like this:
             (LIST 'LIST
                   (LIST 'CONS ''C (LIST 'QUOTE C)))))
  (MACRO 1 E))
We'll do this in stages:
(defun make-macro-pattern-matcher-body (pattern)
    ,@(map 'list (lambda (var)
     `(list 'cons '',var `',,var))
    (macro-lambda-list-variables pattern))))

(defun make-macro-pattern-matcher (pattern)
  (let ((body (make-macro-pattern-matcher-body pattern)))
    (lambda (form)
      `(macrolet ((macro ,pattern
  (macro ,@form)))))

(defmacro macro-pattern-matcher (pattern)
  (let ((matcher  (make-macro-pattern-matcher pattern)))
    `(lambda (form)
       (eval (funcall ',matcher form)))))
Now we can make a pattern matcher that works like the macro destructuring facility:
CL-USER> (setq *matcher* 
       ((foo bar &optional (baz 'default baz-supplied-p) . more) quux
               &rest rest
               &key ((:key key-variable) 'key-default key-supplied-p) key2
               &aux (auxvar 'auxvalue))))

CL-USER> (funcall *matcher* '((1 2 3 4) 5 :key 6 :key2 7))
((FOO . 1)
 (BAR . 2)
 (BAZ . 3)
 (MORE 4)
 (QUUX . 5)
 (REST :KEY 6 :KEY2 7)
 (KEY2 . 7)
You can do a similar trick with regular lambda lists, but while they have keywords, they don't destructure.

You have to be careful when writing the expansion for the binding alist. Too much quoting and you end up with the names rather than their values in the output:
((foo . foo)
 (bar . bar)
not enough, you end up with the values of the values in the output:
CL-USER> (defvar e 22)

CL-USER> (funcall *matcher* '((1 2 e) 5))
((FOO . 1)
 (BAR . 2)
 (BAZ . 22) ; Wrong! Should be 'Eetc…)

Wednesday, February 12, 2020

A polygot program puzzle

Can you come up with an expression that evaluates to the symbol 'scheme in a Scheme system, but evaluates to the symbol 'common-lisp in a Common Lisp system?

Anaphoric if

An anaphoric if expression binds the identifier “it” to the value of the conditional in the scope of the consequent
(aif (find-broken-computer)
     (fix it))
I have two objections to anaphoric macros. The first is that the binding of “it” isn't obvious, the second is the inflexibility of the variable name. You are kind of stuck with “it”. What if you wanted to use “it” to mean something else? Maybe you have an IT department to fix your computers
(let ((it (find-department "IT")))
  (aif (find-broken-computer)
       (tell it (fix it))))   ; name conflict
or maybe you want to nest two anaphoric conditionals
(aif (find-broken-computer)
     (aif (find-broken-component it)
          (repair it)
          (replace it)))
In this case, I want to replace the computer if I cannot repair the broken component, but the inner binding of “it” shadows the outer binding and makes it inaccessible.

The solution is pretty obvious if you think about it (though sometimes it takes me a while to see the obvious). Just replace the conditional test form with a binding form
(aif (it (find-broken-computer))
     (fix it))
This makes it obvious we are binding the variable “it” to the value returned by (find-broken-computer), and it lets us choose the name we bind. If we want to nest these, it would look like this
(aif (computer (find-broken-computer))
     (aif (component (find-broken-component computer))
          (repair component)
          (replace computer)))
But I'm not sure if this is so much more concise and clear than simply using a let expression that it is worth adding this syntax to the language. It's one more thing the reader of the code has to be prepared to encounter.

A slightly different approach would move the binding form closer to where it is used. Note that there is no point in binding a name in the alternative clause to the conditional because it will always have the value nil.
(aif (find-broken-computer)
     (it (fix it)))
and instead of using a let binding, I could use a lambda binding
(aif (find-broken-computer)
     (λ (it) (fix it)))
aif no longer needs to be a macro but can be an ordinary function, which might be handy if your language doesn't have macros
(defun aif (it consequent &optional alternative)
  (if it
      (funcall consequent it)
      (if alternative
          (funcall alternative)

(aif (find-broken-computer)
     (λ (computer)
        (aif (find-broken-component computer)
             (λ (component) (fix component))
             (λ () (replace computer)))))
The explicit lambdas make it obvious what is being bound and what the scope of the binding is, but they do add a bit of visual noise.

Instead of using anaphoric if, I just write the slightly more verbose
(let ((computer (find-broken-computer)))
  (if computer
      (let ((component (find-broken-component)))
        (if component
            (repair component)
            (replace computer)))))
The binding is obvious, and I get to choose the variable being bound; both problems solved. I don't see a compelling reason to use the anaphoric version.


Hexstream suggests that “No discussion of anaphoric macros can be complete without at least mentioning anaphoric-variants:” I wouldn't want to be incomplete, so consider it mentioned.

Tuesday, February 11, 2020

Macro pitfalls

Macros are a unique source of power in Common Lisp, but there are some pitfalls to watch out for.

A compiler macro is special macro that is expanded only by the compiler. The interpreter doesn't expand the macro and simply evaluates the form like a normal function call. If you aren't careful when writing a compiler macro, the interpreted and compiled forms may not evaluate the same and that's probably not what you want. Here we abuse this effect
(defun foo (x) 'interpreted)

(define-compiler-macro foo (x) ''compiled)

CL-USER> (foo)

CL-USER> ((lambda () (foo)))
That might be unexpected. It appears that in this implementation (SBCL) the compiler is called on lambda expressions when they are evaluated.

Like all macros, a compiler macro is given the unevaluated source code of the arguments, not the value. We can see that in this example
(defun foo (l r)
  (format t "~%Foo")
  (list r l))

(define-compile-macro foo (l r) 
     (format t "~%Foo")
     (list ,r ,l)))

CL-USER> (foo (progn (format t "~%First") 'l) (progn (format t "~%Second") 'r))

(r l)

CL-USER> ((lambda () (foo (progn (format t "~%First") 'l) (progn (format t "~%Second") 'r))))

(r l)
When interpreted, the arguments are evaluated left to right before the function is entered. When compiled, the arguments end up being evaluated right to left and after the function is entered.

Unless you really want this — and shame on you if you do — you have to be careful when writing your macro to preserve the left-to-right, call-by-value semantics that are probably expected. The easiest way to do this is to write the expansion so that it just substitutes the body of the function. Something like this
(define-compiler-macro foo (l r)
  `((lambda (l r)
      (format t "~%Foo")
      (list r l))

CL-USER> (foo (progn (format t "~%First") 'l) (progn (format t "~%Second") 'r))

(r l)
Or you could use a let expression with the same effect
(define-compiler-macro foo (l r)
  `(let ((l ,l)
         (r ,r))
     (format t "~%Foo")
     (list r l)))
The version with the lambda expression doesn't even require putting a block of let bindings at the front. You just plop down the original argument list and body after the lambda, but both forms are equivalent.

The problem with doing this is that you have probably disabled the ability of the compiler to optimize the expression. You are forcing the compiler to ensure that the arguments are evaluated in left-to-right order before the body. A Sufficiently Smart compiler might be able to provide some optimizations anyway. If your compiler is not Sufficiently Smart, you can take matters in to your own hands and substitute the arguments at the point they are used. Just be aware that you might be surprising people by changing the semantics at the call site.

Funny semantics isn't just a problem with compiler macros. Regular macros have to be written with care as well or you may surprise users when they write code they think are normal function calls. Compiler macros just have the unique property that they can change the semantics between interpreted and compiled code.

You can see a related effect when using symbol macros. A symbol macro substitutes a piece of code that computes a value. If we write
CL-USER> (let ((l (progn (format t "~%First") 'l))
               (r (progn (format t "~%Second") 'r)))
           (format t "~%Let body")
           (list r l))

Let body
(r l)
we get the standard left-to-right, call-by-value evaluation. But we can mimic normal-order reduction by substituting the code for l and r before evaluating the body of the let by use of symbol-macrolet*
CL-USER> (symbol-macrolet ((l (progn (format t "~%First") 'l))
                           (r (progn (format t "~%Second") 'r)))
           (format t "~%Symbol-macrolet body")
           (list r l))

Symbol-macrolet body
(r l)
If one of the arguments to a macro is a block of code, for instance the &body argument, then you probably want to avoid accidental variable capture.
(defmacro capturing-macro (&body body)
  `(let ((temp 'captured))
     (format t "~%Macro body binds temp to ~S" temp)

(let ((temp 'lexical))
     (format t "~%Temp is ~s" temp)))

Macro body binds temp to CAPTURED
The lexical binding of temp is shadowed by the binding introduced by capturing-macro. This is probably unintended (except in the case of anamorphic macros, where capture is intended). Instead, you can ensure lexical scoping is maintained by closing over the body before introducing any new bindings
(defmacro non-capturing-macro (&body body)
  `(let ((temp 'captured)
         (body (lambda () ,@body)))
     (format t "~%Macro body binds temp to ~S" temp)
     (funcall body)))

(let ((temp 'lexical))
    (format t "~%Temp is ~s" temp)))

Macro body binds temp to CAPTURED
In this case, even a fairly naive compiler ought to be able to inline the call to body because it is simply a lexically apparent code block.

Inadvertent capture can happen in other direction as well if the macro caller shadows a binding used by the macro.
(flet ((funcall (x) (format t "~%Unexpected")))
  (let ((temp 'lexical))
      (list temp))))

Macro body binds temp to CAPTURED
Here the caller shadowed funcall and the code the macro introduced ends up inadvertently calling it. This doesn't happen often in practice because people rarely shadow the top-level functions a macro depends upon, and that is good because there isn't an easy way to solve this reverse capture problem (other than don't do that).

The “hygienic” macro system in Scheme solves both kinds of accidental capture by appropriately renaming variables. There is a price, however. You either have to forego direct code manipulation and use a special pattern matching language, or write code that explicitly keeps track of the environment where the variables are bound. For simple macros, the pattern matching language is adequate, but for more complex macros, neither option is appealing.

*Macrolet rhymes with Chevrolet, naturally.

Sunday, February 9, 2020

Four ways to use macros

The way I see it, there are about four five basic ways to use macros in Common Lisp.

First are macros that circumvent the regular call-by-value semantics. These might evaluate a subform at macro expansion time, treat a subform as a place (an l-value) rather than a value, or otherwise treat a subform as something other than a runtime function call. For example, if incf fully evaluated its argument, it could perform the increment on the value, but it couldn't put the value back where it got it. Another example is the check-type macro. You use it like this:
(defun foo (x)
  (check-type foo (integer 1 *) "a positive integer")
  (bar (- x 1)))
The check-type macro has to be a macro because it treats foo as a place (it will allow you to proceed by modifying foo), and it treats its second argument as a type specifier.

Second are macros that introduce new syntax to the language. Examples are cond, case, do, dotimes, defun, defvar, etc. These treat their arguments specially or have special clauses that don't act like ordinary function calls.
CL-USER> (macroexpand-all '(do ((i 0 (+ i 1))
                                (j 1 (* j 2)))
                               ((> j 65536) nil)
                             (format t "~%~2d ~5d" i j)))

  (COMMON-LISP:LET ((I 0) (J 1))
      (GO #:G748)
      (TAGBODY (FORMAT T "~%~2d ~5d" I J))
      (COMMON-LISP:LET* ((#:NEW1 (+ I 1)) (#:NEW1 (* J 2)))
        (SETQ I #:NEW1)
        (SETQ J #:NEW1)
      (IF (> J 65536)
          (GO #:G747))

Third are macros that implement tiny languages within Common Lisp. The loop macro is a good example. It looks like this
(loop for i from 1 to (compute-top-value)
      while (not (unacceptable i))
      collect (square i)
      do (format t "Working on ~D now" i)
      when (evenp i)
        do (format t "~D is a non-odd number" i) 
      finally (format t "About to exit!"))
It works like a little compiler. It parses the loop clauses and generates a Lisp form that carries them out
             (TYPE (AND REAL NUMBER) I))
        (WHEN (> I #:LOOP-LIMIT-744) (GO SB-LOOP::END-LOOP))
         (#:LOOP-LIST-HEAD-745 #:LOOP-LIST-TAIL-746) (LIST (SQUARE I)))
        (FORMAT T "Working on ~D now" I)
        (IF (EVENP I)
            (FORMAT T "~D is a non-odd number" I))
        (SB-LOOP::LOOP-DESETQ I (1+ I))
        (FORMAT T "About to exit!")

Fourth are macros that run code in a special context. The with-… macros and when and unless fall in this category. These macros take ordinary Lisp code and wrap it with other code that establishes a context or tests a conditional.
CL-USER> (macroexpand '(when (condition) (foo) (bar)))

    (PROGN (FOO) (BAR)))

CL-USER> (macroexpand '(with-open-file (foo "~/.bashrc" :if-does-not-exist :create)
                         (print (read-line foo))

(LET ((FOO (OPEN "~/.bashrc" :IF-DOES-NOT-EXIST :CREATE)) (#:G751 T))
    (WHEN FOO (CLOSE FOO :ABORT #:G751))))

These aren't hard and fast categories, many macros can be thought of as in more than one category. All macros work by syntactic transformation and most treat at least one of their subforms as something other than a call-by-value form, for instance. There are also the occasional macros that have the look and feel of a standard function calls. The series package appears to allow you to manipulate series through standard function calls, but works by clever macroexpansion into iterative code.

I find it useful to think of macros in these four different ways, but I'm sure that others have their own categorizations that they find useful.


An anonymous reader asked, “What about code walking/analysis/optimization?”. I really overlooked that. I think Richard Waters's series package would be a good example. It takes ordinary functional programs that can operate on series of data (coinductively defined data or codata), and turns it into the equivalent iterative construct that operates on one element at a time. It does this by clever macros that walk the code, analyse it, and rewrite it to a more optimal form
CL-USER> (sb-cltl2:macroexpand-all '(let ((x (scan-range :from 0 :below 10)))
                               (collect (choose-if #'evenp x))))

                   (#:LASTCONS-751 (LIST NIL))
                   (#:LST-752 #:LASTCONS-751))
           (TYPE CONS #:LASTCONS-751)
           (TYPE LIST #:LST-752))
    (SETQ X (+ X (COERCE 1 'NUMBER)))
    (IF (NOT (< X 10))
        (GO SERIES::END))
    (IF (NOT (EVENP X))
        (GO #:LL-756))
    (GO #:LL-756)
  (CDR #:LST-752)))
As you can see, the series code does a major rewrite of the original lisp code. An astute reader will notice that the let form has to have been redefined to do dataflow analysis of it's bindings and body. Thanks to anonymous for the suggestion.

Addendum 2: Symbol macros

A comment by Paul F. Dietz got me thinking about symbols and it occurred to me that symbol macros deserve their own category as well. Symbol macros appear to be an ordinary symbolic references to variables, but they expand to some code that computes a value. For instance, if foo were a symbol macro that expanded to (car bar), then using it in a form such as (+ foo 7) would expand to (+ (car bar) 7). A symbol macro is a two-edged sword. It is a very useful abstraction for providing a name to a computed value, but they also can fool the user of such a macro into thinking that a simple variable reference is happening when some more complex computation could be happening.

I think that makes seven ways and counting.

Thursday, February 6, 2020


There are times when you are faced with a complex piece of control flow
try {
    if (condition())
        ... block 1 ...
        switch (someValue())
          case CASE_A:
            ... block 2 ...

          case CASE_B:
            ... block 3 ...

            ... block 4 ...
} catch (SomeException someException) {
    ... block 5 ...
and you want to abstract the control flow — all the conditionals, switches, and try/catches — away from the code that does the work — the various blocks. In fact, here I've abstracted it away by putting "... block n ..." in place of the blocks of code.

If I were writing this in Scheme or Common Lisp, I'd consider using continuation passing style. I'd write a function dispatch-on-foo that would perform the dispatch, but then invoke one of several first-class procedures passed in as arguments
(defun dispatch-on-foo (foo bar case1 case2 case3 case4 case5)
   (if (... complex conditional ...) 
       (funcall case1)
       (handler-case (case some-value
                       ((case-a) (funcall case2))
                       ((case-b) (funcall case3))
                       (t (funcall case4)))
         (error (condition) (funcall case5)))))
At the call site, I'd write
(dispatch-on-foo <arg1> <arg2>
  (lambda ()
     ... block 1 ...)
  (lambda ()
     ... block 2 ...)
  (lambda ()
     ... block 3 ...)
  (lambda ()
     ... block 4 ...)
  (lambda ()
     ... block 5 ...))
This is a win when the complexity of the dispatch is enough that you don't want to replicate it at every call site. Notice how the nested blocks of code have been pulled up to the same level and linearized. Granted, you've cluttered up the call site with lambda expressions, but as Steele pointed out, you can think of these as anonymous go tags: dispatch-on-foo in essence will end up doing a jump to one of these tags and execute the block there, skipping and ignoring the other blocks. Once you get used to thinking in this way, the lambdas disappear just like the parens do for a seasoned Lisp hacker. They just look like jump targets or case labels, and the call site looks a lot like a case expression. It is a bit more powerful than an ordinary case expression because you could arrange for dispatch-on-foo to funcall the appropriate closure on an argument (and have the lambda expression take an argument of course).

You could do something analagous with Java 8's lambdas, but on the rare occasion I've wanted to do something similar in Java 7. The problem is that Java 7 doesn't have lambda expressions. The solution is to change these anonymous lambdas into named callback methods. First we define a generic interface with our callbacks:
interface DispatchOnFooCases<T> {
    T caseOne (void);
    T caseTwo (void);
    T caseThree (void);
    ... etc. ...
then we define the dispatch method:
<T> T dispatchOnFoo (FooClass foo, BarClass bar, DispatchOnFooCases dispatchOnFooCases)
    try {
        if (conditional())
            return dispatchOnFooCases.caseOne();
            switch (someValue()) {
              case CASE_A:
                return dispatchOnFooCases.caseTwo();

              case CASE_B:
                return dispatchOnFooCases.caseThree();

                return dispatchOnFooCases.caseFour();
    } catch (SomeException someException) {
        return dispatchOnFooCases.CaseFive();
finally, at the call site, we write this:
    int value =
        dispatchOnFoo<int> (foo, bar,
            new DispatchOnFooCases<int> ()
                int caseOne (void)
                    ... block 1 ...
                    return aValue;

                int caseTwo (void)
                    ... block 2 ...
                    return aDifferentValue;

                ... etc. ...
The good news is that we've accomplished our goal of abstracting the complex conditional dispatch from the code that does the real work — the method bodies at the call site.

There is, unfortunately, a fair amount of bad news. First, if you thought lambda expressions introduced clutter, then this is a serious amount of clutter. Between @Overrides, type declarations, interfaces, and methods, there is just a lot of extra stuff you have to type. It still might be worth the clutter if the dispatch conditions are complex enough. They just need to be that much more complex to justify all this machinery. (We've actually done the work the compiler would do to allocate and pass a “multi-closure”.) There are cases where this pays off, though.

The second piece of bad news is that Java is not (in general) tail recursive. This means that the call to dispatchOnFoo and the callback to one of the cases both introduce a new stack frame. So although the case methods run in the same lexical environment as where they are defined, they are running two stack frames deeper. This won't make much of a difference unless you try to loop by recursively calling the code. In that case, you need to be very careful to limit the amount of recursion or you will overflow the stack. It is best to avoid recursion as much as possible in the bodies of the cases.

You probably won't need to resort to this doing this. It can be a case of the cure being worse than the disease. The complexity of introducing callbacks can exceed the complexity of the conditional you are trying to abstract. But this is an interesting way to abstract a very complex conditional and can come in handy when you can justify using it. I have actually used this technique in production code to separate some complex control flow from the code that did the actual work.