Monday, December 20, 2010

Side effects

Before continuing on with transactions I want to describe how side effects are implemented. Objects in the persistent store are immutable, so we really cannot modify an existing object. We can, however, allocate a brand new object that differs from the old one only in that the ‘side-effected’ fields have different values from fields in the original object. We'll enter this new object in the object-map, but instead of assigning it a fresh OID, we'll re-use the OID from the original object as well.

The object-map, as you may recall, is also a pure functional object. Adding an entry does not mutate any part of the object-map but instead makes a new object-map that shares as much structure as possible with the old one. Any OID dereferencing that happens under the old object-map will continue to see the original object with its original contents. Dereferencing an OID under the new object-map will see the new version of the object with the ‘modified’ field. We mimic side effects by creating a whole new world where everything is identical to the original except for the thing that appears to change.

Normally, one would expect that creating a new world every time you wish to mutate an object would be prohibitively expensive. But since the objects and the object-map itself are immutable, we can re-use them freely. This makes it reasonably efficient. Creating a new world takes time proportional to the logarithm of the number of objects persistently stored. Even having a huge number of objects adds very little time and space overhead to simulate mutation.

You may think this is an awfully roundabout way to do something simple. After all, why not do the traditional thing of simply maintaining a snapshot of the current consistent state of the world and actually mutating it? This would require modifying the logging strategy slightly. A special mutation record would have to be written to the log to echo the mutation being performed in memory. If the machine crashes, the log could be read forward to re-do the mutations that were in progress at crash time. Alternatively, it could be read backwards to undo the mutations that ought to be undone if the transaction didn't commit.

The rationale for this roundabout solution is simplicity. It is hard to make transactions work correctly every single time given the fact that a hard failure can occur at any moment at all. Use Google to search for “transaction bug” and take a look at how easy it is to make subtle mistakes. By making it impossible to mutate existing persistent store records and indexes, we guarantee that bugs in the transaction layer simply cannot affect earlier, successful transactions. Cleanup and recovery of the persistent store remain as easy and trivial as before: we simply find the most recent valid commit record. There are no undo or redo logs to process. The transaction algorithm can be proven correct by inspection.

Once we have ‘mutation’, we'll need to ensure that we can maintain consistency when we use it. We want the persistent store to behave as if the mutations had some serial order to them. We augment each transaction object with some more bookkeeping information. Each transaction must record which objects it dereferences, allocates, and ‘mutates’ before it commits. We'll also add a transaction manager object which examines transactions before they are committed to determine if a serial ordering can still be computed. If there is a single thread performing transactions, then a serial ordering is virtually guaranteed. But if multiple concurrent threads run transactions in parallel, we have to make sure that each transaction runs in isolation from each other, and that the sets of locations read and written do not intersect with other transactions. If the transaction manager finds a conflict, it aborts one of the transactions.

Again, we note that this implementation can easily handle conflicts because it is very simple. Aborting a transaction is easier than committing it. You simply discard the transaction and you don't write a commit record for it. Allowing transaction to proceed and only checking for conflicts at commit time is an ‘optimistic’ strategy. In many situations an ‘optimistic’ concurrency strategy can perform better than a ‘pessimistic’ one. With a slight change it is easy to go to a ‘pessimistic’ model. In a pessimistic model, conflicting transactions are detected when a field is mutated. If the mutated field has been read within another transaction, then the other transaction may be working with stale data if it commits before this one. A pessimitic model is slightly simpler to understand because you check the validity of each operation just before performing it. However, an optimistic model performs very well if concurrent changes to the exact same objects do not often occur.

While trying to bring this code back to life I encountered two bugs in MIT/GNU Scheme. The first was a bug in a rare corner case of structure definition. The second was caused by the fact that the floating point layout changed between the 32 and 64 bit releases. I've fixed these, but I have a more interesting problem.

The MIT Scheme object system, SOS, is derived from many of the ideas in Tiny CLOS. It is missing, however, a fully functional MOP. In addition, method dispatch is done only on the types of arguments. This makes it tricky to do ‘EQL specializers’. EQL specializers are necessary to smoothly integrate object instantiation. You can work around this limitation in some ways, but it would be much better if the implementation had this as a built-in feature.

Friday, December 17, 2010

Transactions, part I

Faré has persuaded me to post actual working and tested code. It seems that enough has changed since I switched computers and the code that was working is working no longer. (The problem has nothing to do with my code — a shared library referred to by Scheme seems to have moved and Scheme came crashing down.) I'll get things working and show the live code. Until then, we'll have to make do with illustrative code fragments.

Given the index structure I described over the past two days, you can easily see that we have solved the problem of internal and external references. We use OIDs as a sort of ‘pointer’, and we modify the access mechanisms to hide the redirection. An analagous modification to make-instance gives us aggregate objects by initializing the instance slots with the appropriate OIDs.

The next thing to deal with is transactions. These are very easy to implement using this design. We have provided no mechanism to rewrite any data in the backing store; it is opened in append-only mode. When we augment the object-map to associate new OIDs with new objects, we construct new object-map nodes and append them to the backing store, but we do not modify any existing object-map nodes. If we were to ignore the new object-map and re-use the old one, we would not be able to access the new objects. The entire object graph would appear to us as if time had stopped at the point when the old object-map was created. This is the key to implementing transactions.

When we begin a transaction, we create a transaction object and copy the most recent object-map root node to it. As we create objects we update the transaction object with augmented object-map. If the transaction fails, we simply discard it. The next transaction is initialized with either the new object-map (if the prior transaction committed) or the old object-map (if the prior transaction aborted). There is no need to ‘undo’ anything because persistent objects cannot be mutated.

When the computer shuts down (either by a crash or planned outage) and is restarted, we use the information in the backing store to restore the state of the program. We need to find the most recent, but valid, root of the object-map. Since we cannot plan crashes, we might find objects and object-map nodes at the tail of the backing store that were written by a transaction that was in progress when the crash happened. We want to ignore these because they represent intermediate state. The trick is this: when a transaction commits we append a special ‘commit record’ to the backing store. The commit record will contain the location (within the store) where the root node of the object-map is stored. When the process restarts, it scans the backing store to find the latest valid commit record and it initializes from there. Any objects that were allocated after that commit record are simply ignored.

If transactions usually commit and the computer rarely crashes, then the most recent commit record will usually be the very last record in the backing store. If the computer crashes a lot, or if transactions are usually aborted, then it is likely that the backing store would accumulate abandoned objects. One could develop a way of reclaiming that storage.

Here is an actual commit record:
 (:record-format 0.4)
 (:timestamp 3331389370)
 (:object-map-offset 5971200)
 (:previous-record-offset 5970432)
 (:reason "Update user context.")
The commit record has start and end markers so we can identify incomplete commit records. The object-map is at location 5971200 in the backing store, and the previous commit record is at location 5970432. (For extra safety, the commit record should contain a CRC or hash of the object-map. It should probably contain a CRC or hash of itself in order to make it self-validating. Or you can punt.)

The :reason is just a string. It is not used for anything, but it is extremely important nonetheless. The :reason gives us an audit trail in the event of a catastrophic failure. This is part of the “belt and suspenders” safety mechanism. The backing store may kept for a long time, perhaps even many years. Although we will protect the data with multiple mechanisms to reduce the probability of simultaneous failures, our last line of defense is to manually inspect the backing store records. The :reason field will make this much easier.

Ok, now for some code to illustrate. The actual code has a number of bells and whistles, and I'll show them later, but it is best to start with a stripped down version.
(defconstant *transaction-dispositions*
  '(:running :committing :aborting :committed :aborted))

;; running = transaction should commit upon return,
;;           but abort if a throw occurs
;; aborting = transaction is still running, but it must abort upon exit
;; committing = transaction is still running, but it must commit
;;              upon exit, even if throwing
;; aborted = transaction is over, it aborted
;; committed = transaction is over, it committed
(deftype transaction-disposition () `(member ,@*transaction-dispositions*))

(eval-when (:load-toplevel :compile-toplevel :execute)
  (defvar *current-transaction*)
  (setf (documentation '*current-transaction* 'variable)
        "The current transaction.  Not bound if we are not in a transaction."))

;;; Transactions
(defclass transaction ()
  ((disposition :accessor transaction/disposition
                :initform :running
                :type transaction-disposition)
   (reason      :accessor transaction/reason
                :initarg :reason
                :initform "Unknown"
                :type string)
   (object-map :accessor transaction/object-map
               :initarg  :object-map)))
The code for running a transaction is conceptually simple, but the bells and whistles obscure the simplicity. I'll present a stripped-down version first and then show the full version in a later post.
(defun call-with-transaction (pstore reason receiver)
  (let ((transaction   nil)
        (normal-return nil)
        (abort-reason  "Non-local exit from call-with-transaction."))
            ;; This is the only place that transactions are created.
            (setq transaction
                   :object-map (persistent-store/object-map pstore)
                   :reason reason))
                (let ((*current-transaction* transaction))
                   (funcall receiver transaction)))
              (setq normal-return t)))

        ;; If the transaction is null, we never even got started.
        (unless (null transaction)
          (ecase (transaction/disposition transaction)
             ;; If we return normally and no one decided to commit
             ;; or abort before now, we commit.
            (:running (if normal-return
                          (transaction/do-commit transaction)
                            (setf (transaction/reason transaction) abort-reason)
                            (transaction/do-abort  transaction))))
            (:committing (transaction/do-commit transaction))
            (:aborting   (transaction/do-abort  transaction))))))

;; Aborting is simple.
(defun transaction/do-abort (transaction)
  "Perform the necessary work to abort the transaction."
  (assert (or (eq (transaction/disposition transaction) :aborting)
              (eq (transaction/disposition transaction) :running)))
  (setf (transaction/disposition transaction) :aborted))

;; Committing is simple, too.  Just write a commit record.
(defun transaction/do-commit (transaction)
  "Perform the necessary work to commit the transaction."
  (assert (or (eq (transaction/disposition transaction) :committing)
              (eq (transaction/disposition transaction) :running)))
   (let* ((object-map (transaction/object-map transaction))
          (pstore (object-map/pstore object-map)))
     (persistent-store/commit pstore
         :object-map object-map
         :reason (transaction/reason transaction))
     (setf (transaction/disposition transaction) :committed)))
More posts to follow...

Wednesday, December 15, 2010

More persistence

In my last post, I described how to map OIDs to persistent objects by means of a functional, weight-balanced, binary tree. In this post, I will describe how to implement aggregate objects. I'm going to start with a high-level view and drill down to the details. I'm afraid at this point I'm going to switch to Common Lisp and CLOS. Don't worry, though, if you just pretend that defclass is a synonym for define-structure, you won't be far off.

Here's a simple CLOS class definition:
(defclass test-class ()
  ((name :initarg :name
         :initform 'zippy)))
Objects of class test-class have a single field which is called name. This class definition adds a method to make-instance that works as follows:
  ;; make an instance where the name is the symbol GRIFFY 
  (defvar test-instance-1 (make-instance 'test-class :name 'griffy))

  ;; alternatively, use the default name of ZIPPY
  (defvar test-instance-2 (make-instance 'test-class))
In addition, we can use slot-value to extract the name:
  (slot-value test-instance-1 'name)

  (slot-value test-instance-2 'name)
  => ZIPPY
And that's that. (And you thought CLOS was complicated.)

A persistent class is defined as follows:
(defclass test-class ()
  ((name :initarg :name
         :initform 'zippy))
  (:metaclass persistent-standard-class)
  (:schema-version 0))
The :metaclass specifier tells CLOS that we want instances to be persistent. The :schema-version is a bit of magic that protects against the unlikely, but possible nasty case that over time a class definition might evolve where a slot name is accidentally re-used. In such a case, the code will need to disambiguate which class definition was intended and it can use the :schema-version. You can safely ignore this by following this simple rule: If you ever modify the slots in a persistent class, you should increment the :schema-version.

So what is different? On the surface, practically nothing changes. The calls to make-instance and slot-value work just as before. There are four changes that happen ‘under the covers’
  1. The object returned by make-instance will be a persistent object.
  2. Two additional slots are allocated. The first contains the OID of the instance itself, the second contains the persistent store where the instance was allocated.
  3. The name slot in the object will contain the OID of the persistent name rather than a Lisp object. make-instance is modified to extract the OIDs of its arguments to initialize this slot.
  4. slot-value must be modified to dereference the OID that is actually in the slot.
These modifications are easily made using the CLOS MOP.

We use the meta-object protocol to override slot-value-using-class as follows:
(defmethod clos:slot-value-using-class ((class persistent-standard-class)
                                        (object persistent-standard-object)
     (persistent-standard-object/pstore object)
The call to call-next-method invokes the underlying mechanism for reading a slot value. But we're not storing the actual slot value in the slot, we are storing the OID of the actual slot value, so call-next-method will return that OID. We look up the returned OID in the persistent store to get the value. (In order to support multiple persistent stores, we need to know which store contains the value. We require that all subfields of an object be allocated in the same persistent store as the containing object.)

The modifications to make-instance are more complex. We actually don't modify make-instance directly, but instead we modify clos:shared-initialize (which is invoked by make-instance as part of the instance allocation protocol).
;; Override the primary shared-instance method
;; in order to deal with persistent slots.
(defmethod clos:shared-initialize ((instance persistent-standard-object) slot-names
                                   &rest initargs
                                   &key persistent-store oid
    ;; We have to wrap the initargs and initforms
    ;; in persistent-objects and create an initializer
    ;; for this object.
    (let* ((class (class-of instance))

           (init-plist (compute-persistent-slot-initargs class
                                                         (or persistent-store  *default-persistent-store*)

           (oid (persistent-object/save
                     (make-initializer class
                                       (class-schema-version class)
                     (or persistent-store  *default-persistent-store*)

      (apply #'call-next-method instance slot-names (nconc init-plist initargs))
      (setf (persistent-standard-object/oid instance) oid)

(defun compute-persistent-slot-initargs (class persistent-store initargs)
  "Scan over the persistent effective slots in CLASS,
   determine the value to be assigned to each slot, either
   from the initargs or from the initfunction, then
   using the persistent-initarg as a key, construct a
   plist for use in the persistent initializer and in
   the inner call to shared-initialize."
  (let ((result nil))
    (iterate (((slot-initargs slot-initfunction slot-persistent-initarg)
               (map-fn '(values t t symbol)
                       (scan-class-persistent-effective-slots class))))
      (let ((initial-value (slot-initial-value initargs slot-initargs slot-initfunction
        (unless (eq initial-value (clos::slot-unbound-value))
          (push slot-persistent-initarg result)
          (push (slot-value->persistent-node persistent-store initial-value) result))))
    (nreverse result)))
In compute-persistent-slot-initargs, you can see the call to slot-value->persistent-node that extracts the OID from the initarg. In shared-initialize, the call to persistent-object/save writes an initialization record to the store and then allocates and initializes the transient, in-memory version of the object.

These are the essential changes to the MOP to implement the basic persistence mechanism.

More details to come in the next post.

Tuesday, December 14, 2010

Oh yeah, about that persistent store....

In April of last year I was describing a simple persistent object store. I never finished. Mea culpa. I was thinking about it again and I realize why. The next step in the implementation is a really big one. We're going to solve five unrelated problems in one fell swoop:
  1. Transactions — we will add the ability to roll-back partially completed operations.
  2. Aggregate objects — up to now, we only allow primitive objects to be persisted. We will add the mechanism to allow aggregate structures to be stored and retrieved.
  3. Side effects — the backing store is opened in append mode, so mutation of existing objects is not possible. Since we only allow primitive immutable objects to be stored, this is not an issue. Once we have aggregate objects, however, we will want to simulate the ability to side-effect these objects. The transaction system will have to know how to undo these effects if the transaction aborts.
  4. Schema migration — Persistent objects live a long time. Over time, it is usually the case that we will want to change what fields are contained in an aggregate object. Removing a field is easy, but adding or renaming an existing field can be tricky because legacy objects will be missing the field. Furthermore, if the aggregate object model is rich (for example, if it supports inheritance), some changes to the model might require making extensive changes to the existing persistent objects.
  5. Internal and external references — up to now, an object cannot refer to another object. There is also no way to refer to a particular persistent object other than by directly using it. For example, although we could store two different strings in the persistent store, and we can recover both, we cannot identify which one is ‘first’ and which one is ‘second’.
I haven't figured a way to truly separate these five different features, so this post is going to be a long and complex one.

We need to introduce object identifiers (OIDs). An OID is an integer that refers to a unique persistent object. Every persistent object in a particular persistent store is given a different OID (we do not require that OIDs are universally unique).

At this point, let me illustrate the code.
(define-structure (object-map-entry
                   (type list)
                   (conc-name object-map-entry/)
                   (constructor %make-object-map-entry
                                (object-id address cached-value)))
  (object-id #f read-only #t)
  (address #f read-only #t)

  ;; *** A slot holding the cached value greatly improves
  ;;     performance!
  (cached-value #f read-only #t)

Each object-map-entry associates an object-id with an address. The address is the location within the backing store where the object has been serialized. The cached-value not only speeds up object retrieval, it ensures that every dereference of an OID yields the same EQ? result. Extrinsic identity — the existence of a universal ‘identically equal’ predicate — is a key assumption of Lisp, so it is important to avoid violating this assumption.

We will be maintaining a data structure that maps OIDs to objects. This data structure will itself need to be persistent. Over time, we expect the ratio of new objects to old objects to decrease, so a data structure that supports incremental update will be much more efficient than one that requires wholesale replacement. A weight-balanced binary tree is one way of implementing this map.

A tree has two kinds of nodes: leaf nodes, which hold a single object-map-entry, and branch nodes, which hold sub-trees in addition to an object-map-entry. Object lookup requires descending into the tree to find the object-map-entry with the correct OID.
(define (persistent-tree/descend object-id current-tree best-tree)
  (cond ((null? current-tree) best-tree)
        ((< object-id 
               (object-map-entry/object-id (persistent-tree/object-map-entry current-tree)))
         (persistent-tree/descend object-id 
                                  (persistent-tree/left-child current-tree) best-tree))
         (persistent-tree/descend object-id
                                  (persistent-tree/right-child current-tree) current-tree))))

(define (persistent-tree/find-entry root object-id)
  (let ((best-tree (persistent-tree/descend object-id root '())))
    (if (null? best-tree)
        (let ((entry (persistent-tree/object-map-entry best-tree)))
          (if (< (object-map-entry/object-id entry) object-id)
Adding an entry to the OID tree is much more complex. Binary trees work best when they are balanced, so when we add a node to the tree we may want to rebalance the tree. This may require creation of new internal nodes of the tree. In addition, we want our tree to be persistent, so any new nodes must be written to the backing store. However, we want the tree to be efficient — adding a node and rebalancing the tree should not require too much computation and disk traffic.
This particular implementation uses a `functional binary tree'. Updates to the tree do not modify existing data structures. Instead, new nodes are allocated and as much of the existing tree as possible is re-used. (See Stephen Adams `Implementing Sets Efficiently in a Functional Language' for the details on the algorithms.) Here is the code:
(define-structure (persistent-tree
                   (conc-name persistent-tree/)
                   (constructor make-persistent-tree (left-child-address

  ;; Address of left child of the tree.
  ;; This field is persistent.
  (left-child-address #f read-only #t)

  ;; Address of right child of the tree.
  ;; This field is persistent.
  (right-child-address #f read-only #t)

  ;; Weight of this tree.
  ;; This field is persistent.
  (%weight #f read-only #t)

  ;; Where the persistent version is in the durable store.
  ;; This field is transient and reconstructed upon deserialization.
  (%address #f read-only #t)

  ;; Cached left child.
  ;; A transient copy of the deserialized left child.
  (left-child #f read-only #t)

  ;; Cached right child.
  ;; A transient copy of the deserialized right child.
  (right-child #f read-only #t)

  ;; The object map entry stored at the root of this tree.
  (object-map-entry #f read-only #t))

(define (persistent-tree/make-leaf durable-store entry)
  (let* ((weight 1)
         ;; Serialize the persistent information from the
         ;; object-map-entry.  (the object-id and the address).
         (address (primitive-serialize-leaf durable-store (object-map-entry/address entry))))
    (make-persistent-tree 0 0 weight address '() '() entry)))

(define (primitive-serialize-leaf durable-store entry-address)
   (lambda (oport)
     ;; store the delta because it is likely to be a small number.
     (write entry-address oport))))

(define (persistent-tree/make-branch durable-store left-child right-child entry)
  (if (and (null? left-child)
           (null? right-child))
      (persistent-tree/make-leaf durable-store entry)
      (let* ((weight (+ 1
                        (persistent-tree/weight left-child)
                        (persistent-tree/weight right-child)))
             (left-child-address (persistent-tree/address left-child))
             (right-child-address (persistent-tree/address right-child))
             ;; Serialize the addresses of the
             ;; left and right child, and the persistent information 
             ;; from the object-map-entry (the object-id and the address).
              (primitive-serialize-branch durable-store
                                          (object-map-entry/address entry))))
        (make-persistent-tree left-child-address

(define (primitive-serialize-branch durable-store
   (lambda (output-port)
     (write left-child-address output-port)
     (write-char #\space output-port)
     (write right-child-address output-port)
     (write-char #\space output-port)
     (write entry-address output-port))))

(define (persistent-tree/add durable-store root new-entry)
  (if (null? root)
      (persistent-tree/make-leaf durable-store new-entry)
      (let ((root-entry (persistent-tree/object-map-entry root))
            (left-child (persistent-tree/left-child root))
            (right-child (persistent-tree/right-child root)))
        (cond ((< (object-map-entry/object-id new-entry)
                  (object-map-entry/object-id root-entry))
                (persistent-tree/add durable-store left-child new-entry)
              ((< (object-map-entry/object-id root-entry)
                  (object-map-entry/object-id new-entry))
                (persistent-tree/add durable-store right-child new-entry)
               (persistent-tree/make-branch durable-store 
                                            left-child right-child

(define (persistent-tree/t-join durable-store left-child right-child entry)
  (let ((l.n (persistent-tree/weight left-child))
        (r.n (persistent-tree/weight right-child)))
    (cond ((< (+ l.n r.n) 2)
           (persistent-tree/make-branch durable-store left-child right-child entry))

          ((> r.n (* 5 l.n))
           (persistent-tree/l-join durable-store left-child right-child entry))

          ((> l.n (* 5 r.n))
           (persistent-tree/r-join durable-store left-child right-child entry))

           (persistent-tree/make-branch durable-store left-child right-child entry)))))

(define (persistent-tree/l-join durable-store left-child right-child entry)
  (if (< (persistent-tree/weight (persistent-tree/left-child right-child))
         (persistent-tree/weight (persistent-tree/right-child right-child)))
      (persistent-tree/single-l durable-store left-child right-child entry)
      (persistent-tree/double-l durable-store left-child right-child entry)))

(define (persistent-tree/single-l durable-store x r entry)
   (persistent-tree/make-branch durable-store 
                                x (persistent-tree/left-child r) entry)
   (persistent-tree/right-child r)
   (persistent-tree/object-map-entry r)))

(define (persistent-tree/double-l durable-store x r entry)
  (let ((r.l (persistent-tree/left-child r)))
     (persistent-tree/make-branch durable-store
                                  (persistent-tree/left-child  r.l)
     (persistent-tree/make-branch durable-store
                                  (persistent-tree/right-child r.l)
                                  (persistent-tree/right-child r)
                                  (persistent-tree/object-map-entry r))
     (persistent-tree/object-map-entry r.l))))

(define (persistent-tree/r-join durable-store left-child right-child entry)
  (if (< (persistent-tree/weight (persistent-tree/right-child left-child))
         (persistent-tree/weight (persistent-tree/left-child left-child)))
      (persistent-tree/single-r durable-store left-child right-child entry)
      (persistent-tree/double-r durable-store left-child right-child entry)))

(define (persistent-tree/single-r durable-store l z entry)
   (persistent-tree/left-child l)
   (persistent-tree/make-branch durable-store
                                (persistent-tree/right-child l)
   (persistent-tree/object-map-entry l)))

(define (persistent-tree/double-r durable-store l z entry)
  (let ((l.r (persistent-tree/right-child  l)))
     (persistent-tree/make-branch durable-store
                                  (persistent-tree/left-child  l)
                                  (persistent-tree/left-child  l.r)
                                  (persistent-tree/object-map-entry l))
     (persistent-tree/make-branch durable-store
                                  (persistent-tree/right-child l.r)
     (persistent-tree/object-map-entry l.r))))

In the next post, I will describe how we use this structure to implement transactions, simulate side-effects, and turn schema migration into a non-issue.