Monday, March 31, 2025

Avoiding Stringly Typed Code

It can be tempting to implement certain objects by their printed representation. This is especially true when you call out to other programs and pass the parameters in command line arguments and get a result back through the stdout stream. If an object is implemented by its printed representation, then serialization and deserialization of the object across program boundaries is trivial.

Objects implemented by their printed representation are jokingly referred to as “stringly typed”. The type information is lost so it is possible to pass strings representing objects of the wrong type and get nonsense answers. There are no useful predicates on arbitrary strings, so you cannot do type checking or type dispatch. This becomes a big problem for objects created from other utilities. When you call out to a bash script, you usually get the response as stream or string.

The solution? Slap a type on it right away. For any kind of string we get back from another program, we at least define a CLOS class with a single slot that holds a string. I define two Lisp bindings for any program implemented by a shell script. The one with a % prefix is the program that takes and returns strings. Without the % it takes and returns Lisp objects that are marshaled to and from strings before the % version is called. The % version obviously cannot do type checking, but the non-% entry point can and does enforce the runtime type.

1 comment:

nytpu said...

In a parallel vein, I'd argue that you should avoid marshaling and marshaling unless you actually need to work with the underlying structure of the data, because it's needless extra work and can lead to errors. e.g. damn everyone transcodes/reinterprets filenames as UTF-8 even though on *nix filenames are 0x2F-delimited arbitrary bytestrings, and on Windows filenames are possibly very invalid UTF-16. Or with just data read from an external source in general, unless you need to look at the contents of the string, just keep it as-is so it stays intact throughout the chain of reading to re-writing.

Of course if you are keeping the data untouched you should certainly still encapsulate it in another type that indicates the contents are opaque, rather than just passing around raw strings or byte arrays or such.