Friday, February 5, 2010

A Wacky Idea

All computer languages should come with a fully parenthesized API to their syntax tree.

The semantics of other languages are nasty enough. Having to deal with quirky syntax is just rubbing salt in the wound.

Furthermore, the language reader and pretty-printer should be break-out modules that can simply be imported and linked. I'm getting sick of writing parsers just because I can't trivially hijack an existing one without bringing in the entire compiler.

Wacky idea #2: A standalone syntax-rules preprocessor could be used on the parenthesized syntax tree. This would give the language a nice hygienic (lexically scoped) macro system with no work on the language implementor's part. (The preprocessor would have to know what the native binding forms are and how to distinguish identifiers from keywords.)

10 comments:

jfm3 said...

I'm working on it ;)

http://jfm3-repl.blogspot.com/2010/01/sketch-of-rosebud-syntax.html

icosahedron said...

My current is to write a good lisp/scheme interpreter/compiler and then play with syntax on top of it.

I think Scheme could be the common denominator between most dynamic languages.

Jrm said...

Don't lets limit ourselves to dynamic languages! Fully parenthesized syntax to a static language is better than the punctuation soup we currently have.

John Clements said...

GRRR! YOUR POST MAKES ME ANGRY. And sad. because it's so obviously true. Except that if we take your suggestion, then *the terrorists have won.*

Also, XML is a HUGE red herring in this arena; I can't believe how hard it is to use XML as an interchange format for abstract syntax trees.

Are you listening, XML? No, of course you're not.

John Clements said...

Also, your macro idea leaves open the mechanism by which the new macros are expressed in the concrete syntax of the existing language, unless your point is that only the people using the parenthesized syntax should get to use the macros.

Faré said...

For macros to work properly, you'll want to

1- know all the binding forms

2- have a different kind of macros for each non-terminal.

3- when the user defines a macro, let him define new such non-terminals (and thus their associated macro namespace).

All this means that you actually need a pretty rich mapping of the language semantics, despite the apparent simplicity of the SEXP syntax.

Jrm said...

Doing macros as a hack in this way wouldn't be a complete, turnkey solution, but it'd allow you to hack together *something*. I'm not aware of any research in reverse grammar manipulation, but I bet you could tag macros in such a way that you could annotate the syntactic class. Then you could say "This macro Foo...bar should parse just like a try...finally." And let the system back-patch the parser. It'd be a lot better than a poke in the eye with asharp stick (or the C preprcessor, which is almost as painful. ) This is derived from the next WAcky idea to appear tomorrow.

Free Thinker said...

This idea is right on, but with many qualifications, some of which have already been expressed. First, we need a typed scheme as an IR, perhaps with some 'Any' type to express the semantics of dynamic languages. This strategy also allows for "soft typing" on dynamic languages to be expressed in the s-expr IR.

Second, the range of required datatypes in the IR is larger than what a Scheme implementation typically provides. Algebraic datatypes and unions come to mind, and then we should consider parametric polymorphism, "punning" function names via ad hoc type and/or arity overloading and so forth.

Control constructs in the source language need to be represented with sufficient specificity to be efficiently compiled from the s-expr IR. case statements over a character and set of character constants come to mind, as these often warrant a jump-table-lookup based implementation, not a typical (cond (x eq #\c)....) style implementation.

Conversely, the semantics of Scheme or Lisp's 'and' and 'or' control constructs are pretty unique to Scheme/Lisp. Returning returning #t from (or 1 #f) is very different from returning 1 from the same expression!

Next, I think it would be wise to simply remove all overloaded operators from the s-expr IR. We don't likely want an '+' operator overloaded over a total numeric tower in a good IR.

Finally, the s-expr IR has to cope with some set of name management, module, template, instantiation, etc. mechanisms. Some of these are easier than others by themselves, but when combined, evaluation (expansion, whatever) order issues come into play. I can picture a set of individual, simple, if clumsy, mechanisms to handle all the individual mechanisms of this sort, and then a combination/ordering language that explicitly reflects the semantics of the source language.

It would be interesting to take a family of languages - say C, Python, Haskell, R6RS Scheme - and try to methodically work out a single s-expr based IR that can accurately capture the semantics of these languages.

Oh, a final note. If the IR will be used to develop sharable language tools (a who-calls database generator, a debugger, etc.), the IR will have to account for relating s-expr expressions back to file and line information from the source texts. I believe PLT Scheme's implementation handles this elegantly through macro expansions and the like. Might be worth a peek.

Good luck!

Michael said...

I don't know. In a non-Lispy language, manipulating the AST is far more trouble than it is worth. By the time I had finished writing a medium complexity macro in, say, C# or Java, using the API I would have been better off porting (or writing bindings for) whatever I needed to Scheme or CL.

HilbertAstronaut said...

Thus was written: "All computer languages should come with a fully parenthesized API to their syntax tree.

The semantics of other languages are nasty enough. Having to deal with quirky syntax is just rubbing salt in the wound."

The more pressing problem is deficient language interoperability. Why should I have to learn somebody's pet language just to call their library?