Friday, April 30, 2010

Importance of terminology

An upcoming study of first-year computer science students has an interesting observation about terminology. It seems that their understanding of the computer science vocabulary is much worse than one would expect. You cannot blame the teachers because I'm sure they were unaware of the problem. You cannot blame the students for being `dumb' because nearly all of them could identify a `function body'. Yet fewer than one in three could tell you what is meant by a `defined name' or a `predicate'. Really! The results surprised me, but in thinking about it, I have a couple of personal anecdotes that highlight the problem.

Every year or so there used to be a heated debate on one or more usenet groups (remember usenet?) between the static and dynamic type factions. (I won't argue either side.) One year, however, I encountered someone who challenged me about my viewpoint. He said that he knew many people who held a similar viewpoint to mine, and he had a lot of respect for them. They weren't idiots. But he could not understand their rationale. His viewpoint made perfect logical sense to him, so how could another intelligent person disagree so violently? He didn't want to argue, he wanted to understand how anyone could find the alternative viewpoint to be as logical. (He assumed it must be so, or there wouldn't be so many respectable adherents to it.)

We discussed it back and forth for a while in a non-heated way and it dawned on me that the fundamental problem is simply one of terminology. `Static typers' and `dynamic typers' have completely different understandings of the word `type'. They cannot agree on the simplest things because they each think the other knows what he means by `type'.

For the sake of clarity, I'll lay out the definitions:
  • Type1 — an equivalence class for data. Two objects are of the ‘same type’ if operations on one could be (or could have been) performed on the other with analagous results. For example, ‘integer’ is a type because objects like ‘3’ and ‘42’ can both be added, subtracted, multiplied, and divided.
  • Type2 — a purely syntactic label that can be assigned to a code fragment. There are rules for assigning such labels to primitive fragments and rules for composing types as the code fragments are composed. Certain compositions are disallowed. A well-typed program has no disallowed compositions, and a type-checking program can verify this. By clever construction of the rules, certain useful runtime properties can be proven.
I think you can see that these two definitions have almost nothing to do with each other.

Once I understood this, I stopped using the word ‘type’ and started talking about ‘tags’ — manifest metadata attached to all objects in memory that describe what each object is. ‘Static typers’ understand the utility and benefit such things, they don't understand why anyone would want to forego compile-time ‘type checks’. They don't mean the annoying task of decorating every variable and procedure in the system with a ‘type specifier’, but rather having the compiler notice that your code said to vector-ref the result of a floating point divide and that most likely is not what you intended. Notwithstanding the contrarians who complain that maybe they did want to vector-ref a float, every Lisp hacker I know of appreciates getting early notification of these kind of errors (provided that false positives are very rare and it remains unnecessary to kowtow to the compiler).

Next anecdote later...