Thursday, June 7, 2007

Domain-specific Languages in Lisp

On 6/7/07, Grant Rettke wrote:
>
> That said, when I think of a DSL think about letting folks write
> "programs" like:
>
> "trade 100 shares(x) when (time < 20:00) and timingisright()"
>
> When I think about syntax transformation in Lisp I think primarily
> about language features.

In order to talk about domain-specific languages you need a definition of what a language is. Semi-formally, a computer language is a system of syntax and semantics that let you describe a computational process. It is characterised by these features:

  1. Primitive constructs, provided ab-initio in the language.
  2. Means of combination to create complex elements from the primitives.
  3. Means of abstraction to control the resulting complexity.

So a domain-specific language would have primitives that are specific to the domain in question, means of combination that may model the natural combinations in the domain, and means of abstraction that model the natural abstractions in the domain.

To bring this back down to earth, let's consider your `trading language':
"trade 100 shares(x) when (time < 20:00) and timingisright()"

One of the first things I notice about this is that it has some special syntax. There are three approaches to designing programming language syntax. The first is to develop a good understanding of programming language grammars and parsers and then carefully construct a grammar that can be parsed by an LALR(1), or LL(k) parser. The second approach is to `wing it' and make up some ad-hoc syntax involving curly braces, semicolons, and other random punctuation. The third approach is to `punt' and use the parser at hand.

I'm guessing that you took approach number two. That's fine because you aren't actually proposing a language, but rather you are creating a topic of discussion.

I am continually amazed at the fact that many so-called language designers opt for option 2. This leads to strange and bizarre syntax that can be very hard to parse and has ambiguities, `holes' (constructs that *ought* to be expressable but the logical syntax does something different), and `nonsense' (constructs that *are* parseable, but have no logical meaning). Languages such as C++ and Python have these sorts of problems.

Few people choose option 1 for a domain-specific language. It hardly seems worth the effort for a `tiny' language.

Unfortunately, few people choose option 3. Part of the reason is that the parser for the implementation language is not usually separable from the compiler.

For Lisp and Scheme hackers, though, option 3 is a no-brainer. You call `read' and you get back something very close to the AST for your domain specific language. The `drawback' is that your DSL will have a fully parenthesized prefix syntax. Of course Lisp and Scheme hackers don't consider this a drawback at all.

So let's change your DSL slightly to make life easier for us:

 (when (and (< time 20:00)
            (timing-is-right))
     (trade (make-shares 100 x)))

When implementing a DSL, you have several strategies: you could write and interpreter for it, you could compile it to machine code, or you could compile it to a different high-level language. Compiling to machine code is unattractive because it is hard to debug, you have to be concerned with linking, binary formats, stack layouts, etc. etc.

Interpretation is a popular choice, but there is an interesting drawback. Almost all DSLs have generic language features. You probably want integers and strings and vectors. You probably want subroutines and variables. A good chunk of your interpreter will be implementing these generic features and only a small part will be doing the special DSL stuff.

Compiling to a different high-level language has a number of advantages. It is easier to read and debug the compiled code, you can make the `primitives' in your DSL be rather high-level constructs in your target language.

Lisp has a leg up on this process. You can compile your DSL into Lisp and dynamically link it to the running Lisp image. You can `steal' the bulk of the generic part of your DSL from the existing Lisp: DSL variables become Lisp variables. DSL expressions become Lisp expressions where possible. Your `compiler' is mostly the identity function, and a handful of macros cover the rest.

You can use the means of combination and means of abstraction as provided by Lisp. This saves you a lot of work in designing the language, and it saves the user a lot of work in learning your language (*if* he already knows lisp, that is).

The real `lightweight' DSLs in Lisp look just like Lisp. They *are* Lisp. The `middleweight' DSLs do a bit more heavy processing in the macro expansion phase, but are *mostly* lisp. `Heavyweight' DSLs, where the language semantics are quite different from lisp, can nonetheless benefit from Lisp syntax. They'll *look* like Lisp, but act a bit differently (FrTime is a good example).

You might even say that nearly all Lisp programs are `flyweight' DSLs.

7 comments:

Robert said...

I'm not entirely sure that I understand your point about the need to learn all about LL(1), LALR, etc. I have done this with some DSLs I have built, but most of the time the beauty of lisp is that I am working with s-expressions, which are essentially parse tree elements. That means that I can safely and happily ignore these parsing issues. Indeed, this is part of the reason why I can be productive using lisp. If I needed to do a lot of fussing about parsing, then I wouldn't have a big leg up on people building DSLs using Java or C....

Joe Marshall said...

I think you did get my point. The original poster on plt-scheme was wondering if Lisp and Scheme are so good for DSLs, why hasn't he seen any good examples? I was pointing out that Lispers tend to punt on the fancy syntax and so the DSL looks like Lisp.

grant rettke said...

Thank again for that reply, it was very helpful.

Anonymous said...

I know this is an old post, but I have recently become interested in creating tiny languages/DSLs to help my non-programmer colleagues automate some of their work. The problem I have with this post is maybe I don't understand what a DSL is. You say "Lispers tend to punt on the fancy syntax and so the DSL looks like Lisp." You could replace "Lisp" with any language couldn't you? If so, what then is the benefit of Lisp over any other language? What this approach to a DSL means to me is the non-programmer has to become familiar with the syntax of the language of choice. If that is the case, then I believe a language with a more natural language flow will be more familiar than Lisp. For example if implementing the "DSL" in python it could look like:

```
if timingisright() and time < T('20:00'):
trade_shares(100, x)
```

compared to the Lisp example:

```
(when (and (< time 20:00)
(timing-is-right))
(trade (make-shares 100 x)))
```

Neither makes me feel great when what I want is the non-programmer to write something like:

```
trade 100 shares of x before 20:00 when timing is right
```

Do I have my understanding of what a DSL is wrong?

Joe Marshall said...

I don't think you are wrong. I was pointing out that if you don't mind the surface syntax being Lisp like (Polish prefix notation) that you can just use READ to parse your language, and if your language constructs are mostly like Lisp that you can just use macros to implement (compile) the few extra constructs you want to add.

I think the surface syntax is a different aspect of domain specific languages. I see an advantage to having the surface syntax look like Lisp: I write code for Lisp programmers, and I don't want them to have to learn a funny new syntax just for the domain specific parts of the program. I want them to already know 99% of the language already. But that's not necessarily an approach for everyone.

I also see an advantage to using recursive descent evaluation of the abstract syntax. It is a tried and true method, and most modern languages follow some variant of recursive descent semantics, so it will be familiar to programmers.

> You could replace "Lisp" with any language couldn't you?

I don't think so. Lisp is unique in that all the language constructs are in Polish prefix notation, and macros give you the ability to manipulate the abstract syntax before evaluation. Most other languages only allow you to extend parts of the language, like the set of functions that may be called. You cannot, for example, write a new construct in Java that has the syntax of a try {} finally {} statement. Languages with "decorators" or "annotations" get you a lot of power in this regard, although the scoping and semantics of decorators and annotations tend to be very limited compared to the rest of the language.

> ... what I want is the non-programmer to write something like:
>
> ```
> trade 100 shares of x before 20:00 when timing is right
> ```

I would consider this to be more of a heavyweight DSL. The surface syntax is such that you would need to define parsing rules and precedence, and the semantics don't seem to be particularly recursive, so your abstract syntax isn't very tree-like. You wouldn't get much of an advantage by pulling Lisp like constructs into your DSL, and you probably don't even want to.

But consider this idea:
(for-each (lambda (stock)
(trade 100 shares of stock
before (- (end-of-trading) (minutes 5))))
portfolio)
In this case, I have the full power of Lisp as sub-expressions and super-expressions of my language, and I only have to implement a macro called "trade". Yes, I know this isn't what you asked for, nor is it what you desired. But I do consider this to be a lightweight domain specific language: it is pretty obvious that it is a program that describes a stock trading strategy (a poor one). It would be harder to implement this in, say Java:
for (final stock in portfolio) {
trade 100 shares of stock before endOfTrading - minutes(5);
}

Unknown said...

I must admit I'm feeling confronted with a completely different set of problems coming with that initial supposed-to-be-a-DSL example:

Even though Grantt seems to be pretty confident that he knows what his example is supposed to do, this is not all that clear to me, and I would like to ask him to show me some more example expressions from this DSL which might give me a better overview of its intended expressive power.

Yes, I know, this example has been made up only for the sake of this discussion - but it's got more than only just a tiny bit of inner structure, and I just can't tackle that immediately. I'm not a lisper like Joe, im a Tcl guy, which means that while I appreciate Joe's theoretical approach ... (after all, searching for quotes like his very valuable initial answer in order to gain a better understanding of what I'm doing - well, doing in fact without having an all-encompassing theory like his in mind - is what brought me here quite some time ago; and, maybe a struggle to find some arguments that I might use to sell this approach to some colleagues) ... I'll rather take a more practical approach, working my way through the words of my DSL, preferably from the left to the right, when trying to put an example like this into action.

So, these are my gnawing issues (you might, maybe, say, I'm questioning the semantics more than the syntax - but I consider myself too much a practitioner to use a word like "semantics" in exactly the defined ways only ;) :

(1) It is pretty unusual to see the "trade" command combined with a "when" this way ... Joe's separation gets this much clearer. How am I supposed to understand this? Is this the way Grantt's language handles a combination similar to Joe's? Or is the "when" just another parameter to "trade" that's not getting special treatment by the general parser? (I've done things like this in my tested and heavily used DSLs, just to pull out that parameter somewhere deep down in the algorithm and evaluate it then. If so: will the parameters to "when" have to be evaluated before "trade" is called or later? So, maybe this tiny language is supposed to even have closures? Wow!) Is this appended "when" a special language feature which can be applied in some special cases only? Or, maybe, is this altogether a "trade-when" command in a language adhering to some SQL-like twisted BNF?

(2) Is this an imperative command which does its thing only once, exactly when it's called - or is it meant to register with some kind of reactor ... so it will be brought to fire exactly when timingisright()? And, if so: once? repeatedly? Now, combine that with that closure uncertainty I've mentioned before. Wow, more pretty advanced language/runtime features for a supposedly simple language. Lisp and Tcl each have their way to clarify stuff like that - in our example here, I'm only left wondering.

(3) I see an algol-like call syntax in timingisright() and shares(x) (even though the latter seems more like object construction to me). The parantheses around the comparison operator seem to clarify precedence rather than actually "call" (?) the "when". Did I get thet right? BUT BEHOLD: "trade" itself doesn't use any parentheses, so it must be a different beast than the "function calls" we see on the same line. Does this, maybe, support my SQL-BNF-theory? By the way ... the "x" is obviously some kind of variable ... so, yes, we've very obviously been bitten by the "generic language features" Joe mentioned.

... continued

Unknown said...

continuing ...

(4) There are some more, mostly smaller issues remaining which I don't want to delve into too much. For example, my mind clearly sees the intended connection between the "100" and the "shares" - but I just cannot come up with a clearly defined language construct that would justify that for my parser. Finally maybe: How intelligent is "trade"? Does it know perfecly when to buy and when to sell, magically - or could there be more possible parameters, enabling me to specify more precisely WHAT I would like to happen? (I don't want address timingisright() because I consider that as being kept vague intentionally.)

I don't want to split hairs. But I firmly believe that asking back these questions should bring someone like Grantt to recognize all the fuzziness that comes with his pretty shallow example. That's a quite complicated expression he gives us there, so there seemed to be a need for some serious expressiveness in the desired DSL. Otherwise, if there was only this one intended thing to do, he could easily have made this example read "readMyMindAndDoWhatIWant" - after all, reading his mind is no big deal if there's only that one thing he can intend ... or is it? I also firmly believe that trying to deal with that fuzziness should lead those Grantts to discover at least a considerable part of the more practical parts of Joe's explanations by themselves - which might pin down these insights more firmly than having heard third hand.

Please let us try to leave the land of fuzzy maybes and step out on a bright, clear spot. I, myself, am fed up with that fuzzy maybes'n'shoulds I get whenever I try to propagate Tcl for, among others, exactly the same DSL-building reasons: "So what's the big deal? Python should be able to do exactly the same thing." And I agree: "Yeah, you're right. Python really SHOULD be able to do the same. But the difference is: I don't have the slightest clue HOW. Could you please show me?" - But no one ever cares to try.

I admit that I highly estimate Lisp. Reading about Lisp techniques tought me a lot about how to use Tcl wisely - the same way my Pascal knowldge tought me long ago how to use C in a structured way. I even have that phantasies that lisp must be superior to Tcl in several ways (Maybe this puts me into maybe-land, too). But I have no real working experience with lisp, and I wouldn't know how to get anyone else to use it (in Tcl, there's at least that UNIX shell analogy which should work for most everyone). I've become pretty proficient and fast in Tcl, so why sacrifice that? And also, I haven't seen any Lisp variant yet that comes with all that pretty "batteries included" that Tcl has, and which I'm relying on heavily in my projects.

So far, I havent heard of any other language that might be able to treat DSLs similarly, save for these two - and I'd like to call Tcl the obvious choice for down-to-earthers like me who see and and appreciate the purity of Lisp, and are willing to take an advice or two from the Lisp guy, now and then.

Cheers, Thomas