Monday, February 14, 2022

Symbols vs. Strings

Most popular computer languages don't have symbols as a data type. You can make do with a string, or encode your symbol as a small integer or enum. It's a hack, but simple enough and common enough that it doesn't need a second thought.

Lisp has symbols, though, so if you are using a string as a stand-in you should give it a second thought. In Lisp, symbols and strings have different roles. Strings are composite objects, symbols are atomic. String operations are concerned with the contents of the string, symbol operations are concerned with the identity of the symbol.

I saw a Common Lisp Tic Tac Toe program that represented the players' marks with the strings "X" and "O". I could see no reason why it couldn't have used the symbols 'X and 'O. Or the keywords :X and :O. Or how about the Unicode symbols '✗ and '◯?

Symbolic processing is the raison d'ĂȘtre of Lisp. It's a little absurd to represent symbols as strings in Lisp.

Saturday, February 5, 2022

Imperative vs. Declarative

I saw this recently:

    token = request.headers.get('Authorization')
    token = token.split(' ')[1]

From an imperative programming point of view, there is nothing wrong with this. We assign to the token variable the contents of the Authorization header, then we assign to the token variable the second substring after splitting along spaces. After executing these two statements, token will contain the desired value.

From a declarative programming point of view, this is terrible. The first statement binds the name token to the Authorization header, but the second statement contradicts the first by changing the binding of token to mean something else. Thinking declaratively, we'd much prefer either

    header = request.headers.get('Authorization')
    token = header.split(' ')[1]
or
    token = request.headers.get('Authorization').split(' ')[1]

The first option avoids the contradition and reassignment by simply using a separate variable. The item we get from request.headers isn't a token, it's a header, so we should name the variable appropriately. The token is a separate quantity that we compute from the header and the code directly reflects that.

The second option just avoids the intermediate variable and lets the compiler choose how to deal with it. Again, there is no contradiction or reassignment, token receives its final value when it is bound.

The problem is this: in the original pair of statements, the first statement

    token = request.headers.get('Authorization')
while imperatively a valid command, is declaratively a lie. It says that token is literally the Authorization header, but it isn't. The second statement patches things up by fixing the value of token to be what it ought. It seems poor practice to deliberately put these sorts of lies in our programs. We can make the first statement true by simply renaming the variable.