Tuesday, April 30, 2024

Statements and Expressions

In some languages, like Lisp and OCaml, every language construct is an expression that returns a value. Other languages, like Java or Python, have two kinds of language constructs: expressions, which combine compositionally and which have return values, and statements, which combine sequentially and which have no return values and thus must operate by side effect. Having statements in your language needlessly makes things more complicated, but language designers seem to want to go much further and add complexity that just seems capricious.

You cannot usually use a statement in a context where an expression is expected because there is no return value. You can use an expression where a statement is expected by simply discarding the return value. This means there are two kinds of contexts. A sane designer would provide a way to switch between statement and expression contexts, but language designers typically omit these.

Language constructs, such as binding or iteration, must be provided as statements or expressions or both. Language designers seem to randomly decide which constructs are statements and which are expressions based on whims and ease of compiler implementation. If you need to use a construct, but aren’t in the right kind of context, you need to switch contexts, but without a way to do this, you may have to rewrite code. For example, if you have a subexpression that could raise an exception that you want to handle, you’ll have to rewrite the containing expression as a series of statements.

A sane language designer would use the same syntax for the same construct in both expression and statement form, but typically the construct will have a very different syntax. Consider sequential statements in C. They are terminated by semicolons. But sequential expressions are separated by commas. Conditional statements use if/else, but conditional expressions use ?:. When refactoring, you cannot simply move code between contexts, you have to change the syntax as well.

Writing a function adds a new expression to the language. But function bodies aren’t expressions, they are statements. This automatically destroys referential transparency because you cannot substitute the function body (statements) at the call site (expression context).

try/catch is a nightmare. Usually you only get try/catch as a statement, not an expression, so you cannot use exception handling in a subexpression. You cannot get a value out of a try/catch without a side effect. Throw, on the other hand, throws a value, so it could throw to an expression context, except that catch is a statement.

Having two kinds of language constructs and two different contexts in which you can use some of them but not others, and different syntaxes depending on the context just makes it that much more difficult to write programs. You have to keep track of which context is current and which syntax to use and constantly switch back and forth as you write a program. It would be easy for a compiler to introduce the necessary temporaries and rewrite the control flow so that you could use all language constructs in either context, but language designers don’t bother and leave it up to the programmer to do it manually.

And all this mess can be avoided by simply making everything be an expression that returns a value.

3 comments:

Anonymous said...

What about (values) though? A bit of a gray area in the standard, no?

fadrian said...

Even worse, it makes generating code harder, as well, because one has to keep track of the context and be ready to generate different code for each. Not only does this make writing translators into the base language more difficult, it also reduces the utility of features like macro systems because writing artifacts using them become more more complex and difficult, too.

Joe Marshall said...

(values) is less of a problem in Lisp because returning no values to a context that is expecting a value just gets NIL.

Code generation is a pain in the butt. You cannot query which context you are expanding into, so you need to write two macros, one for expansion into a values expecting context, one for expansion into a statement context.

Continuation-passing style is also a problem because you need a values producing continuation in one context and a non-values producing continuation in the other, but the compiler offers no way to tell you which one you need.