Cons cells are nearly the perfect concrete implementation of an abstract two-tuple. They are first-class objects: you can assign them to variables, stuff them in arrays, pass and return them as values, and check them for identity. They are orthogonal to other data types; only a cons-cell returns
consp. They are opaque — except for the defined operations of
cdr, you cannot access the contents of a cons cell. And while they are usually implemented as adjacent memory locations, they hide their representation and there have been many Lisps that have used unusual concrete representations of cons cells like parallel arrays of the
cdrparts or bit codes to omit the
cdraltogether through “cdr coding”. All operations on cons cells can be reduced to the basic operations
(setf car), and
(setf cdr). (If we had immutable cons cells, we could even get rid of the last two, but then we'd want some other means for creating circular and semi-circular structure.*)
So I find it somewhat surprising that the standard linked list implementation in Lisp is a just a terrible example of an abstract data type. This no doubt happened because linked lists got standardized well before abstract data types were really understood.
The big problem with linked lists is that instead of being orthogonal to other data types, it is a subdomain of cons-cells. The representation of a singly linked list is completely exposed: it is a cons cell, without even a wrapper object to tell you if you are dealing with the list itself or its representation. It is only by common convention that certain cons cell structures are considered to represent linked lists. And it isn't immediately clear whether the representation is meant to be a pointer to the first pair of the list, or to the entire “spine” of the list. It is often treated both ways. There is little distinction between a list primitive and a cons cell primitive, which usually doesn't get you into trouble, except in those few cases where it can cause major confusion, like when you have to handle “improper” or “dotted” lists.
Lists are mutable because their representation is mutable and not hidden. It is possible to mutate the representation such that it no longer represents a list anymore, “magically” changing any list that includes the mutated structure into something else. This means either a lot of defensive copying must be done if lists are used as arguments or passed as values, or an unenforced convention to avoid mutation of list structure must be developed in the Lisp culture. We've been pretty good at the latter, even documenting when you can and when you cannot rely on lists being mutated by library functions, but there are always a few people who go against the grain for the sake of “efficiency” (or plain orneriness) and write code that is impossible to use because you cannot easily tell what might be mutated behind your back.
With any abstract data type, there are conceptually a pair of functions that are used to transport objects across the abstraction barrier. One, call it
abs->rep, takes an abstract object and exposes its representation. It is usually provided automatically by the compiler and called upon entry to the object's methods. In Java, for example, it establishes bindings for the
thispointer and the private and protected fields of the object so that the method can use them. The complimentary function, call it
rep->abstakes the representation of an object and hides it in an opaque, abstract version for clients of the object to use. The clients have no way to manipulate the representation of the object because they only have access to opaque, abstract version. In Java, for example, the compiler automatically does this after object construction and when the
thispointer is returned properly cast to the abstract data type. The
thispointer and private and protected fields of the object go out of scope and are no longer accessible.
These functions are usually provided by the compiler and often have no real implementation. The compiler simply ensures that representation comes into scope when the method is called (conceptually calling
abs->rep) and that the representation goes out of scope when the method returns (conceptually calling
rep->abs). No actual code is generated or exists at run time. It's easy to forget this is happening because the compiler does all the work for you. You just toggle the little bit in your head about whether you are “inside” the object or “outside” the object. If you forget, you can just examine the lexical nesting to see if the representation is in scope.
In Lisp, however, for a singly linked list, not only are these functions omitted, they are completely fictitious. It is only in the programmers head that what was once considered a linked list is now to be considered a pointer to head cell of list (
abs->rep) and only probably in the programmers head that the reverse (
rep->abs) is happening on the way out. It doesn't matter much if he or she forgets this because the written code is the same either way. It only matters if he or she somewhere down the line uses a cons-cell operation where a list operation is actually what should be used. This can lead to common rookie mistakes like
listis wanted, yielding
(1 . 2)where
(1 2)is desired. (The “unwanted dot” problem.)
consis wanted, yielding
(1 2)is desired. (The “too many parenthesis” problem.)
- Confusion about whether
((1 2) 3 4)is meant to be a three-tuple of a list and two integers, or a two-tuple of two lists. (It's both, depending on the unwritten intent of the programmer.)
appendis wanted, yielding
((1 2) 3 4)or
((1 2) (3 4))when
(1 2 3 4)is desired. (Again, “too many parenthesis”.)
- Use of
(append ... (list <element>))to “
cons” to the “right end” of a list, leading to
O(n2)algorithms rather than
If you want to be completely contrary, consider Olin Shiver's suggestion: all objects — cons cells, strings, integers, null, etc. — are lists. It's just that every object other than a cons cell is a zero element dotted list. Now rather than being a subtype of cons cells, lists become a supertype of all objects. This viewpoint can probably be made coherent, but it does raise a lot of questions. Here are some that come to mind:
(length '(1 2 . 3))the same as
(length '(1 2 3))? If not, what is
(length '(1 2 . 3))
- Should lists retain their “dottedness” when passed through functions like
map? What is
(memq 2 '(1 2 . 3))? What about
(memq 3 '(1 2 . 3))?
- What is
(reverse '(1 2 . 3))? Is
(compose reverse reverse)an identity?
There is a good argument that “dotted” lists are rarely used and almost always a mistake, but they are built in to the grammar of Scheme as an indicator of “rest” arguments, so getting rid of them would require some other way to specify “rest” arguments. Racket takes things further by allowing doubly dotted lists to indicate infix notation:
(a . < . b)
Just for kicks, I took things in the other direction and wrote some
C#code that implements singly-linked lists as their own abstract data type using special, immutable cons cells that require that their
CDRbe either an existing singly-linked list or the empty list. “Dotted” lists are not a problem because you simply cannot construct one. The representation of a list is explicitly coded as a pointer to the head cons cell of the list. The code illustrates how the abstract list is turned into a the pointer to the cons cell when it is carried across the abstraction barrier and how it is turned back into an abstract list when carried back out. Again, I'm not suggesting anyone use the code, or take it as a serious proposal. (For one thing, it doesn't address what to do about circular lists, or the dotted lists in the Scheme grammar.) It was just a fun hack for illustrative purposes. It is available here for those interested.
*Many years back, Henry Baker said “C'mon, cons cells should just be immutable.” (if I am remembering the exact quote correctly). I agree with his sentiment. Combine immutable cons cells with “hash consing” and the appropriate equality primitives and you get directed acyclic graphs (and their space properties) “for free”. We'd either have to do without circular structure or use another means to achieve it. Since circular structure often leads to divergent programs I wouldn't consider it a great loss, but some may disagree. Perhaps they might be assuaged by a nice set of primitive procedures for creating and manipulating circular cons cell structure.