Saturday, June 1, 2024

Roll Your Own Syntax

Unlike most languages, Lisp represents its programs as data structures. A Lisp program is a set of nested lists. We can look at a Lisp program as a tree, with each nested list as a node in the tree. The first element of each list indicates the kind of node it is. For instance, a sublist beginning with LET binds local variables, a sublist beginning with IF is a conditional, and so on.

In most languages, it difficult or impossible to add new node types to the syntax tree. The syntax is wired into the language parser and if you even can add new syntax, you have to carefully modify the parser to recognize it. In Lisp, adding new node types is quite easy: you just mention them.

To give an example, suppose you wanted to add a new node to the syntax tree called COMMENT, which would have a string and a subexpression as components. You'd use it like this:

(comment "Avoid fencepost error" (+ x 1))

Here's how you could define the semantics of a COMMENT node in Lisp:

(defmacro comment (string expr)
  expr)

That's it. You can now insert arbitrary COMMENT nodes into your Lisp programs.

Compare that to what you would have to do in a language like Java to add a new kind of node to the syntax tree. You'd have to modify the parser, the lexer, the AST classes, and probably a bunch of other stuff. It's non trivial.

For a more complex example, consider adding transactions to a language. In a language like Java, you'd have to modify the parser to recognize the new syntax, and then you'd have to modify the compiler to generate the correct code. In Lisp, you can just define a new node type:

(defmacro with-transaction (body)
  <complex macro elided>)

And then use it like this:

(with-transaction
  (do-something)
  (do-something-else))

Now obviously you should put some thought into doing this. Adding dozens of random new node types to the language would be a bad idea: readers of the code wouldn't be expecting them. But in some cases, a new node type can be just what is called for to abstract out complexity or boilerplate. Lisp gives you that option.

1 comment:

Scott L. Burson said...

I was going to blog about this soon — you beat me to it :-)

But there's more to be said. Suppose Java exposed its compiler internals so as to encourage users to add syntactic constructs, and suppose lots of people started doing that. Sooner or later, there would arise a situation where two packages define syntax that conflicts in some way: maybe they use the same keyword to introduce it, or maybe they both define the same infix operator, perhaps with different precedence. You wouldn't be able to import both package's syntactic extensions into the same source file; there would be no way to resolve the conflict.

Lisp syntax fixes this problem in two ways. First, "keywords" — macro names, in Lisp parlance — are namespaced: they come from a specific (Lisp) package, and although you can't import two symbols with the same name into one client package, you can always refer to one or both with an explicit package prefix.

The second fix is, of course, Lisp's fully-parenthesized syntax, which removes any possibility of ambiguity as to the right end of a syntactic construct. Operator precedence, in particular, is no longer needed.

The way I like to put it is that Lisp syntax makes syntactic extension modular: different extensions written by different people can be used together, with no possibility of a conflict that keeps the combination from working.