Abstract Heresies

Friday, January 28, 2011

A day in the life

I seem to have a career as a professional software engineer. Don't get me wrong, I love doing it, but somehow my avocation became my vocation. C'est la guerre.

Last night I was thinking about what my job really entails and it is a bit different from what I usually think and tell people. The exciting part is taking a poorly formed concept, refining it to a very detailed and well-formed formal concept, translating the ideas to software, and making a tool or service that people can use. There is a tedious part, too. Frankly, I don't find it too monotonous because it involves a lot of puzzle solving and finding creative solutions, but it is necessary work that doesn't seem logically necessary, and that tends to dampen my enthusiasm for doing it.

I spend a lot of time just “plumbing”. There is a lot of infrastructure where I work. In all the projects I've worked on, the vast majority of the code has already been written. So the very first phase of software development is identifying the infrastructure subcomponents that the project should depend upon. This is pretty easy to do. Any basic project will always have these components:

A web server front-end with the following properties:
- Automated load balancing.
- Automated failover.
- High availability.
- Resistance against DOS attacks.
Persistent storage or a database with these properties:
- Geographically separate replicas.
- Regular backups.
- Tools for manual editing and revision of stored data.
Software attendants to keep it running ‘unattended’.

My current project involves selling things (like all businesses need to do on occasion). There is infrastructure to handle purchases, refunds, deal with fraud, etc. We'll also need some more infrastructure to handle user authorization and authentication. And I didn't mention the infrastructure needed to deliver the purchased items. When it all comes down to it, there isn't a whole lot of logic involved in the project itself. Almost all the functionality comes from the underlying infrastructure. Our team spends a lot of time plugging it together. That doesn't mean that it is easy or that it doesn't involve any creativity.

I've found on most projects I work on that there is always some piece of software that I intend to use but I have only very vague ideas about how it might work. My task might be to hook it up.

Of course I spend several days thoroughly reading and understanding the copious documentation.

This is what I really do: I know that the end user is going to be sitting in front of his web browser and at some point click on something. This will eventually trigger a method call that performs the requested operation. Let's assume the user wants to buy something and that the mouse click the user made was the last one before the purchase (the one that asks “Ok to charge your credit card?”). What can I deduce?

There ought to be a method that ultimately handles the ‘purchase’.
Since buying something must be a very common operation, I expect there to be some easy way to charge the user's credit card.
1. I first look for the ‘quick start’ guide or the FAQ.
2. If I can't find an answer there, I look at the table of contents of the detailed documentation.
3. A lot of the time there isn't any documentation or if there is, I can't find it. The end result is the same. If there is no documentation, then I think whether there are similar projects that might want to use the same infrastructure in the same way. I'll eventually find some code that actually works.
4. Now comes a trick. I sketch out in my head a few possible protocol designs. There should be a way, given a user and an item, to “begin” the process of purchasing. There should be a way to determine if the process has completed. There should be a way to determine the outcome of the process. Each of these mechanisms should exist as separate calls because we might have to re-do any part of the process if the computer crashes during the purchase. But since the usual case will involve rapid turnaround and no crashes, there should be an abstraction that handles the entire process from end to end. I poke around in the docs and in the source code to see if these things exist.
5. Now I analyze what I've found and make a guess at the sophistication of the design and the programmer. If I see an API that involves continuation-passing style or a variant (Java callbacks are poor-man's CPS), then I expect a high level of sophistication from the API. I expect there to be a ‘standard’ way of purchasing something and I expect that all the edge cases are handled in a reasonable way. Why? Anyone that successfully uses a complex concept like continuation-passing style has most likely applied a lot of serious thought to the API.
6. On the other hand, if I see an API that involves a lot of methods that return void and must be called in the right order or they won't work, or an API with ambiguous methods like checkPurchase() and validatePurchase() (which one do I use?), or one where the arguments must be strings that will then be parsed to something that makes more sense, or APIs that throw only a single catch-all error (or just return null on success or maybe null on failure), then I expect that I'm going to have to think hard about all the edge cases myself and I'm going to have to wade through a lot of examples to determine the right ‘magic’ Why? People that write APIs like this aren't thinking. They code up the first solution that comes into their head and when they are done, they call it an API. It is likely that they either didn't think there were edge cases, or it didn't occur to them that they should abstract them away.
So I finally determine what methods need to be called. I look at the argument lists of the methods and I look at the arguments passed in to the servlet. These are generally isomorphic in some way, but it is typically the case that the servlet is being passed a string representation and the underlying method needs an object. If the API isn't completely idiotic, it will provide parsers and unparsers for the data that represent major concepts. So the servlet might get a string that represents a user and a string that represents what he wants to purchase. Now I have to figure out how to call the method.
1. If the API is well designed, there might even be a method that takes the strings directly from the servlet and does all the work including checking inventory and scheduling order fulfillment. This is the rare case.
2. If the API is pretty good, then I'll probably have to make the appropriate calls to parse the servlet arguments to actual data, handle errors, and validate the input. Then, with the parsed arguments in hand I can call the purchase method.
3. If the API is lame, then after the argument parsing I'll have to check the database to see if there is inventory and somehow reserve it. Then I'll run the purchase methods (which will not be unified into one abstraction, but consist of five methods that need to be called in the right order). Then I'll have to see if the purchase succeeded, and if not, return the reserved item to inventory, or if it did, arrange for the item to be delivered.

So how do I know how to use the API if it is undocumented? The simple answer is this: look for methods that have the same type signature as the objects that are at hand in the servlet. There probably won't be very many. Now look at the names of the methods and see if any make sense. The easyPurchase(User user, Item item) seems a lot more likely than the getReceipt(User user, Collection items). A really good API uses names that practically shout out their purpose.

Finally I try out the code and see if it works. If it does, I call it day. If not, then it is time to twiddle, tweak, and frob things. If the API is good, then I'll get informative error messages that tell me what I did wrong. Something along the lines of “Error: user id is an email address, did you forget to call parseUser()?” instantly tell me what I'm doing wrong. Something along the lines of “null pointer exception” followed by a stack trace filled with methods I never heard of before now tell me nothing whatsoever.

This is work. It isn't extremely difficult, but it does require practice, skill, and imagination. It also isn't really engineering. It is more like plumbing. Get this stuff here and put it over there, and you might need an adapter fitting.

Monday, January 24, 2011

A bit of history

The computers of the 50s were very primitive machines. The IBM 704, for example, had 4096 words of ferrite core memory (later expandable up to 32768 words) with an access time of 17 milliseconds. The Central Processing Unit, which was not a monolithic silicon chip, but a cabinet full of vacuum tubes, could execute 40 thousand instructions per second (twenty-five microsecond basic execution cycle).

It cost $50,000 a month just to rent one. (That's more than $4 million a year in today's dollars). And this was one of the most advanced machines available at the time. These days, the level 1 cache control would put this kind of power to shame.

The computer languages at the time were bizarre to say the least. Of course no one at all had any experience programming, let alone designing languages to make programming easier. The conventional wisdom at the time was that ‘information processing’ required an approach that differed fundamentally from ‘numeric calculation,’ so people tried out all sorts of weird ideas. One idea was string processing. The COMIT language was one of the earliest string processing languages that was based around the idea of rewrite rules. It is sort of like sed on acid:

(8             PROGRAM TO BRING IN A NUMBER OF CONSTITUENTS AND PRINT OUT-
               ALL OF THE N FACTORIAL PERMUTATIONS OF THEM)
                       (PROBLEM SUGGESTED BY B. ULVESTAD)

INPUT                                $ = Q + 1 + N       //*RSA1              INPUT
*                                    N = QL + QR/.1                               *
NUMBER Q + $ + $1 +  QL +  $ + QR  + N = 1 + 2 + 4 + 3 + 5 + 7/.*6 + 6/.I1   NUMBER
*              Q + QL + $ + N + $ + QR = 3 + 2 + 6/.1 + 4 + 5                     *
PRINT                          $1 + QL = 2 + 1          //*WAB2               PRINT
*                 QL + $ + $1 + QR + N = 2 + Q + 3 + 1 + 5 + 4                    *
PERMUTE $1 + Q + $1 + $ + QL + $ + QR + N = 3 + 2 + 4 + 1 + 5 + 6 + 7 + 8/.D1     *
TEST               QL + $ + QR + $/.G0 = 1 + 3 + 2 + 4                       NUMBER
MOVE               $1 + Q + $ + QR + 1 = 2 + 1 + 3 + 5 + 4                  PERMUTE

Yngve, Victor H. 1958. “A programming language for mechanical translation.” Mechanical Translation 5 (1). 25-41

The information processing language, IPL, on the other hand is more like assembly code with linked list support:

            COMMENTS                TYPE  NAME  PQ  SYMB   LINK

Type-9 card                          9
                                     1
Type-2 cards define regions.         2     A                1
The B-Region consists of             2     B                1
 one cell, named "B0" or just "B".   2     I                1
                                     2     L                2
The R-region has 2 cells,            2     R                2
 named "R0" and "R1".                2     v                1
                                     2     (                1
                                     2     )                1
Type-1 cards are for comments.       1
                                     1
Type-5, Q=1 says "DATA FOLLOWS."     5           1

L1---(AI(A v B))                          LI        0
                                                    (
                                                    A
                                                    I
                                                    (
                                                    A
                                                    v
                                                    B
                                                    )
                                                    )        0

Type-5, Q=blank says "ROUTINES       5
 FOLLOW."                            1
                                     1
R1 prints the list L1.               1
                                     1
Q=3 says "TRACE THIS ROUTINE."            R1     13 L1
                                                    J151    0

P or Q = 0 can be left blank.

Type-5, SYMB=R1 says execute R1      5              R1

Correspondence between Hugh S. Kelley and Chuck Baker.

These languages differ from INTERCAL in that their designers were serious.

A notable exception was the language ALGOL.

begin integer i; real y; real array a[0:10];
   real procedure f(t); real t; value t;
     f := sqrt(abs(t)) + 5*t^3;
   for i := 0 step 1 until 10 do read(a[i]);
   for i := 10 step -1 until 0 do
     begin y := f(a[i]);
           if y > 400 then write(i, "TOO LARGE")
           else write(i,y);
     end
   end

(taken from Wikipedia article on the Trabb Pardo-Knuth algorithm)

ALGOL doesn't look unusual to the modern eye. That in itself is testimony to the fact that ALGOL was years ahead of its time. As you can imagine, though, it was well beyond the capability of most computers to compile and run an ALGOL program.

When McCarthy started the “Advice Taker” project, he proposed a new language LISP that would be based around the linked-list paradigm of IPL and the λ-calculus of Alonzo Church. One of the goals of the language was to deliberately move away from the Turing-inspired mechanistic model of computation (Turing machine) and towards a more mathematical way of programming based upon recursion. (Incidentally, ALGOL was designed to support recursion, but the very first ALGOL implementations punted because it was too complicated to support and users didn't know how to use it.)

Wednesday, January 19, 2011

Fifty Years Ago

The character-reading functions make it possible to read characters one by one from data records. The choice of input medium is determined by sense switch 1, though at a future time mode-set facilities may be added so that input may be taken from any medium. This description will be given in terms of hollerith cards [sic] as input; the procedure for tape is completely analogous. However, the tape records must be 72 characters long.

Paul Abrahams, Artificial Intelligence Project
RLE and MIT Computation Center Memo 22a
Character-Handling Facilities in the LISP System

In the months following McCarthy's initial description of ‘A Symbol Manipulating Language’, Steve Russell translated McCarthy's eval into IBM 704 machine code and the first LISP was born. By 1961 the implementation of Lisp 1.5 was well under way and a number of new libraries were being written.

Hollerith cards were the main input medium in those days. They were a step up from punched paper tape, though. In fact, punched cards were pretty much the only input medium until online input became available.

Paul Abrahams, personal correspondence

Lisp had read, eval, and print, but the computer itself was shared by the MIT Community. You would submit your card deck for processing and come back some time later to get the printout. The REPL was run in batch mode.

Tuesday, January 18, 2011

Answers for Faré

Faré said...
Problem is that indeed you need to do something about those slots without keyword initializers, slots that are computed from other slots (e.g. hash values), slots that are initialized from a counter (e.g. OID), slots that are actually plumbing from a larger structure (e.g. indexes back into other objects), etc.

You would think so, but it turns out you don't.

When an object is created for the very first time, you must have already written code to deal with slots without keyword initializers, slots computed from other slots, slots that are initialized from a counter, slots that are plumbing, etc. or you wouldn't have been able to create the object. When the object is re-instantiated from the persistent store, there is no reason you cannot simply perform those operations (mutatis mutandis) again.

But what if these operations are not idempotent? Actually, we already know they are not. Each time we call the constructor, we ought to be getting a brand new object, so we don't want the operations to be idempotent. But note that the object is constructed exactly once per ‘session’ — the constructor is never called twice without dropping all references to the constructed object in-between the calls. Therefore, it is not possible to ever observe two separate calls to the constructor. (By ‘observe’ I mean, “You cannot write an extensional program that returns 0 if exactly one call to the constructor occurs, but returns 1 otherwise.“)

Certainly one could, through reflection and other mechanisms write code that intensionally exposes the implementation, but one can always write code that deliberately breaks an abstraction barrier. Although it seems far too easy to break this abstraction barrier by something as mundane as computing a slot value from other slots, in practice it turns out that you won't want to. Again, this can be derived from an information-theoretic argument. At the point of invoking the constructor, the program has provided all the information necessary to correctly initialize the new instance. Any other information used is, by definition, ‘implicit’.

Let us assume, for a brief moment, that the implementation depends critically upon the implicit part of the initialization process. I simply argue that critical dependence upon implicit information is a serious bug, regardless of whether persistence is brought into the picture or not. If the implicit information is designed to be implicit, then the onus is upon the designer to hide the implicit dependency well enough that the client code need not be aware of it.

So let us assume the opposite: the implementation does not depend cricitally upon the implicit part of the initialization process. Well, if there is no critical implicit dependence, there are no bugs that occur because of a critical implicit dependence.

The approach still works, but is actually very low-level, and calls for a higher-level interface of some sort, lest your whole system keep a very low-level feel.

Not at all. In ChangeSafe, the abstraction of a versioned object is built directly upon the persistent object abstraction by layering yet another set of MOP customizations on the existing ones that implement persistence. There is no higher-level interface beyond the normal CLOS interface, yet there is very little code that needs to have explicit calls to the persistence API.

I doubt these arguments will persuade anyone, so I suggest trying to implement things this way and seeing for yourself.

And this reminds me of another thing. Earlier you asked why ‘pointer swizzling’ was performed on reading rather than upon loading of the persistent object. I mentioned a few reasons, but I forgot one really important one: it allows you to build persistent circular structure without needing mutable objects.

Thursday, January 13, 2011

Some Bells and Whistles

There are a few bells and whistles that I added to my persistent store. The first is multiple extents (that is, more than one persistent store open and participating in a transaction). This is fairly easy to add. The only tricky part is ensuring that a transaction that involves multiple extents either fully commits or aborts across all participating stores. The way I chose to implement this is by choosing a single particular persistent store to be the transaction master. The commit record in the transaction master is the one that holds the authoritative record of whether the entire transaction has committed.

The commit proceeds in two phases. In the first phase, each non-master store performs a ‘conditional commit’. This involves flushing the pending writes to the backing store and writing a special commit record that refers to the transaction master. Once each non-master store has written the special commit record, the commit record is written to the master store.

If nothing unusual happens, processing may continue as in the normal case. The interesting thing happens when the non-master store is first opened. If the most recent commit record in a store is a ‘conditional commit’, then it is necessary to check the transaction master to see if it committed. Typically, the relationship between the different stores is that there is one master and one or more ‘satellite’ stores. Therefore, it is likely that the transaction master is already open and within a transaction when the satellite store is opened, so the master commit record will be easily at hand. If not, then the code recursively opens the master store to determine if the conditional commit of the satellite is valid. This adds a little amount of bookkeeping to the commit code, but it isn't particularly complex.

Nested Transactions and Multi-version Concurrency Control (MVCC)

My original plan had been to avoid the complexities of nested transactions and MVCC. I designed this as a simple store, and I figured that a good number of applications would not need nested transactions. I discovered almost immediately that I was wrong. If nested transactions are disallowed and a procedure needs to start a transaction, it has to determine whether an existing transaction is in progress. If it is not, then the procedure needs to begin the transaction and ensure it is committed when it exits. On the other hand, if there is an existing transaction, the procedure should ‘join’ that transaction and let whoever started the transaction decide whether or not to commit. This isn't so bad if you can segregate your procedures into ‘top-level’ (ones that always start and commit a transaction) and ‘subroutines’ (ones that never start and commit a transaction, but expect there to be one in place before being called). Unfortunately, recursive procedures cannot be in both categories simultaneously. A possible workaround is to have duplicate code tailored for each use case, but then the programmer would have to remember which version to call under which circumstances, or we'd have to avoid using recursion. Since this is clearly unacceptable, I bit the bullet and implemented nested transactions.

With the current design, nested transactions turn out to be pretty simple. When a non-nested transactions starts, it gets a copy of the most recent object-map which it uses for OID resolution. When a nested transaction starts, it gets a copy of the current object-map from the immediately enclosing transaction. The nested transaction gets to see the intermediate state of the enclosing transaction. If the nested transaction commits, it does not write a commit record to the store. Instead, the object-map from the nested transaction is used as the object-map of the enclosing transaction when control is returned. Thus the enclosing transaction gets to see the state that the nested transaction committed. When the enclosing transaction commits, the object-map (which includes the nested transaction updates) is written to the store and committed.

If the enclosing transaction aborts, the nested transaction state is simply dropped on the floor and becomes inaccessible. The interesting question is how to handle the case where the nested transaction aborts. In this case, we return control to the enclosing transaction, but we discard the object-map from the nested transaction. The enclosing transaction therefore sees no effect at all from the nested transaction. The enclosing transaction has the opportunity to itself abort (in which case all the intermediate state is discarded), to commit (in which case only the effect of the enclosing transaction is made permanent), or to perhaps continue processing and begin more nested transactions.

In effect, nested transactions act like software transactional memory.

At this point, one can see how MVCC can be easily implemented. Each concurrent thread runs its own transactions with its own object-map. Since all OID resolution is performed against the object-map associated with the current (thread local) transaction, each concurrent transaction sees only its view of the persistent store. (This is a necessary condition, but it is not sufficient. We have to ensure that committed transactions can be given a serial order to preserve the isolation between transactions. We do this by recording in each transaction the OIDs of every persistent object read and written within the transaction. When the transaction attempts to commit, the set of reads and writes is compared against the set of objects read and written by any other parallel transaction that has committed since this one started. If there is an overlap, we abort. That is, we use optimistic locking.)

(This should answer Fare's question about why OID resolution is performed on each read rather than being done just once when the store is opened.)

These features involve adding a good chunk of bookkeeping code, but the code is straightforward, albeit tedious. Good testing code is vital to ensuring that this all works as it should.

Friday, January 7, 2011

Schema evolution finessed

Schema evolution is the bête noire of the object-oriented database world. It is probably the most labor intensive and error prone activity that an end-user is likely to attempt. The root problem is that the metadata about persistent objects is implicitly replicated in several ways.

The big advantage of object-oriented databases is that they unify the query language with the ‘business logic’ language. You access and navigate persistent objects in the exact same way you access and navigate the ordinary data structures in the language. The obvious benefit is that the mechanisms of persistence and querying are hidden behind the object abstraction. But this means that the definition of the data structures are performing double duty. Data structure definitions are used by the compiler to generate the appropriate primitive operations in the code, but they also are needed by the database component to generate the persistent representation of objects. If the data definition changes, objects that conform to the old definition will no longer work.

Programmers generally expect that a recompile is all that is needed to change how a program handles data structures. But if the data definition is used by the database for object storage and retrieval then changing the data definition may change the persistent representation and make it impossible to retrieve data. (Or worse, it could silently retrieve the data incorrectly.)

It turns out that if persistent objects are immutable then there is a way to finesse the issue of schema evolution.

In CLOS, objects are created and initialized when the programmer invokes make-instance. There is a well-defined series of operations that take place and the resulting object is returned to the caller. The important thing here is the intent of the programmer. The arguments given to make-instance contain the specific information that the programmer has determined is needed to correctly initialize the object. The returned instance will depend on the argument values and any default values that are supplied in the class definition. From an information theoretic view, the argument list to make-instance and the resulting instance ought to contain the same information. So rather than attempting to persist the resulting instance, we instead persist the argument list to make-instance.

The immediate objection to this is that there is no way to know what on earth make-instance constructed! It could be the case that make-instance decides to return NIL. There could be a very complex initialization protocol that defaults certain slot values and computes other slot values from the initargs. This doesn't matter. The programmer is working at the abstraction level where the entire process is kicked off by the invocation of make-instance, and thus his intent is simply to create an instance of an object that is parameterized by the given initargs.

When we retrieve an object from the database, we don't attempt to retrieve the representation of the constructed object, we instead retrieve the arguments that were handed to make-instance and simply invoke make-instance again to re-create the object.

Now suppose that we have a database where we are modeling automobiles. Let us assume that we have already stored thousands of automobile instances and that each instance contains, say, the number of doors and the color of the automobile. At some point we decide that we no longer care about the number of doors, but we do want to know the make and model of the automobile. We change the definition of the automobile class in the program. When we invoke make-instance, we now have a different set of initargs. However, the legacy objects in the database are reconstructed by calling make-instance with the old argument lists. This is easily dealt with. The older argument lists will have initargs describing the number of doors and the color of the car. We arrange for make-instance to ignore any initargs that are unused, and we require the programmer to supply ‘reasonable defaults’ for initargs that are not supplied. Thus when retrieving an older object, the count of the number of doors will be ignored, and a default value will be supplied for the make and model.

Note that we no longer require a schema to describe the layout of objects in the database.

There are drawbacks to this strategy. First, we are making the assumption that all the relevant information for object creation is supplied in the call to make-instance. A perverse programmer can easily bypass this assumption. However, it doesn't seem unreasonable to require that programmers either use the object creation protocol in the intended manner, or to specifically arrange for auxiliary helper objects to be created that hold any missing relevant information. This is meant as a tool for programmers, not as a fool-proof system. (Incidentally, the full-blown system I wrote allows the programmer to declare certain slots as transient. This allows a programmer to cache information needed for the runtime without forcing the information to be persisted.) Second, we give up the ability to side effect persistent objects. (As I have noted in previous posts, we can mimic side effects reasonably easy.)

These disadvantages are minor compared to the difficulties of schema evolution.

I'm happy to answer questions about this.

Tuesday, January 4, 2011

EQL Specializers

Pascal Costanza asked an interesting question: Why are EQL specializers convenient for smooth integration with object instantiation?

My original answer was simply that when you want to customize make-instance for a particular class of objects that you end up writing methods like this:

  (defmethod make-instance ((class (eql (find-class 'foo))) ...

This is doable without EQL specializers, but it would be kludgy.

While thinking about this some more, though, it occurred to me that EQL specializers are probably a necessary element for reflective programming.

When you reflectively program the object system, you operate at a meta-level where abstractions such as classes are reified as concrete instances of a meta-class. At the reflected level, object instances are no longer the data being manipulated. Instead, object classes are used to represent sets of instances, and the classes themselves are represented by instances of meta-class objects.

When you cross abstraction levels in this way, you need a set of level-shifting primitives. One of the most important of these is quotation. Quotation is level-shifting primitive that lifts (injects) an instance from beneath the abstraction level to the level itself.

In CLOS, EQL specializers are the quotation mechanism for method dispatch.