Friday, April 30, 2010

Importance of terminology

An upcoming study of first-year computer science students has an interesting observation about terminology. It seems that their understanding of the computer science vocabulary is much worse than one would expect. You cannot blame the teachers because I'm sure they were unaware of the problem. You cannot blame the students for being `dumb' because nearly all of them could identify a `function body'. Yet fewer than one in three could tell you what is meant by a `defined name' or a `predicate'. Really! The results surprised me, but in thinking about it, I have a couple of personal anecdotes that highlight the problem.

Every year or so there used to be a heated debate on one or more usenet groups (remember usenet?) between the static and dynamic type factions. (I won't argue either side.) One year, however, I encountered someone who challenged me about my viewpoint. He said that he knew many people who held a similar viewpoint to mine, and he had a lot of respect for them. They weren't idiots. But he could not understand their rationale. His viewpoint made perfect logical sense to him, so how could another intelligent person disagree so violently? He didn't want to argue, he wanted to understand how anyone could find the alternative viewpoint to be as logical. (He assumed it must be so, or there wouldn't be so many respectable adherents to it.)

We discussed it back and forth for a while in a non-heated way and it dawned on me that the fundamental problem is simply one of terminology. `Static typers' and `dynamic typers' have completely different understandings of the word `type'. They cannot agree on the simplest things because they each think the other knows what he means by `type'.

For the sake of clarity, I'll lay out the definitions:
  • Type1 — an equivalence class for data. Two objects are of the ‘same type’ if operations on one could be (or could have been) performed on the other with analagous results. For example, ‘integer’ is a type because objects like ‘3’ and ‘42’ can both be added, subtracted, multiplied, and divided.
  • Type2 — a purely syntactic label that can be assigned to a code fragment. There are rules for assigning such labels to primitive fragments and rules for composing types as the code fragments are composed. Certain compositions are disallowed. A well-typed program has no disallowed compositions, and a type-checking program can verify this. By clever construction of the rules, certain useful runtime properties can be proven.
I think you can see that these two definitions have almost nothing to do with each other.

Once I understood this, I stopped using the word ‘type’ and started talking about ‘tags’ — manifest metadata attached to all objects in memory that describe what each object is. ‘Static typers’ understand the utility and benefit such things, they don't understand why anyone would want to forego compile-time ‘type checks’. They don't mean the annoying task of decorating every variable and procedure in the system with a ‘type specifier’, but rather having the compiler notice that your code said to vector-ref the result of a floating point divide and that most likely is not what you intended. Notwithstanding the contrarians who complain that maybe they did want to vector-ref a float, every Lisp hacker I know of appreciates getting early notification of these kind of errors (provided that false positives are very rare and it remains unnecessary to kowtow to the compiler).

Next anecdote later...

6 comments:

John Cowan said...

As a Type1 sort of person, I certainly wouldn't complain about a compiler that did modest type inference of that sort. The kind of inference that actually burns my country ass is exemplified by simple containers: you can put any object you want into one, but when you take it out, you have to inform the compiler of the Type2 of the object before you can do anything with it (with perhaps a few exceptions). To escape this straitjacket, simple containers are replaced by parametrically polymorphic families of containers, and pretty soon you're going down the buttered slide faster and faster towards Turing-complete proof systems, where proving your program correct is as hard as writing it correctly in the first place.

Ruben Berenguel said...

I have always a hard time understanding what 'type' is. And waiting for your next story :)

Ruben

Graham Fawcett said...

As a Type1 person Monday through Friday and a Type2 person on
weekends, I have trouble with your depiction of a Type2 as a "purely
syntactic label". As you wrote, a well-typed program can be proven to
have certain useful properties: therefore, well-typedness carries
meaning, and so it must be more than a syntactic property.

Arcane Sentiment said...

"Syntactic" is another annoyingly multiply-defined term. :) In this context, it means type_2 is a property of an expression, rather than of a value or an entire program or something. (Nothing to do with textual representations of programs.)

It seems to me that these bitter type-system arguments have become less common in the last few years. Am I just not reading the right fora (e.g. Usenet)? Do other people see this too?

Graham Fawcett said...

@Arcane: "It seems to me that these bitter type-system arguments have become less common in the last few years. Am I just not reading the right fora (e.g. Usenet)? Do other people see this too?"

I think we've just graduated to finer arguments, like the difference between syntax and semantics. :)

I'm not a linguist nor a philosopher (though our co-commenter John is a bit of both!) but I would maintain that syntax never carries meaning: that's what fundamentally distinguishes it from semantics. Since types_2 carry meaning (although not runtime meaning), they aren't syntax.

Having said that, I know what you're saying and I agree. If only there were a more precise language in which we could discuss these things -- English certainly ain't it!

Fred Blasdel said...

This distinction was far better elucidated in the classic What To Know Before Debating Type Systems

The only nomenclatural mistake he makes is using "type" to mean "type-checking", but almost everyone does that so it's forgivable.