Monday, December 20, 2010

Side effects

Before continuing on with transactions I want to describe how side effects are implemented. Objects in the persistent store are immutable, so we really cannot modify an existing object. We can, however, allocate a brand new object that differs from the old one only in that the ‘side-effected’ fields have different values from fields in the original object. We'll enter this new object in the object-map, but instead of assigning it a fresh OID, we'll re-use the OID from the original object as well.

The object-map, as you may recall, is also a pure functional object. Adding an entry does not mutate any part of the object-map but instead makes a new object-map that shares as much structure as possible with the old one. Any OID dereferencing that happens under the old object-map will continue to see the original object with its original contents. Dereferencing an OID under the new object-map will see the new version of the object with the ‘modified’ field. We mimic side effects by creating a whole new world where everything is identical to the original except for the thing that appears to change.

Normally, one would expect that creating a new world every time you wish to mutate an object would be prohibitively expensive. But since the objects and the object-map itself are immutable, we can re-use them freely. This makes it reasonably efficient. Creating a new world takes time proportional to the logarithm of the number of objects persistently stored. Even having a huge number of objects adds very little time and space overhead to simulate mutation.

You may think this is an awfully roundabout way to do something simple. After all, why not do the traditional thing of simply maintaining a snapshot of the current consistent state of the world and actually mutating it? This would require modifying the logging strategy slightly. A special mutation record would have to be written to the log to echo the mutation being performed in memory. If the machine crashes, the log could be read forward to re-do the mutations that were in progress at crash time. Alternatively, it could be read backwards to undo the mutations that ought to be undone if the transaction didn't commit.

The rationale for this roundabout solution is simplicity. It is hard to make transactions work correctly every single time given the fact that a hard failure can occur at any moment at all. Use Google to search for “transaction bug” and take a look at how easy it is to make subtle mistakes. By making it impossible to mutate existing persistent store records and indexes, we guarantee that bugs in the transaction layer simply cannot affect earlier, successful transactions. Cleanup and recovery of the persistent store remain as easy and trivial as before: we simply find the most recent valid commit record. There are no undo or redo logs to process. The transaction algorithm can be proven correct by inspection.

Once we have ‘mutation’, we'll need to ensure that we can maintain consistency when we use it. We want the persistent store to behave as if the mutations had some serial order to them. We augment each transaction object with some more bookkeeping information. Each transaction must record which objects it dereferences, allocates, and ‘mutates’ before it commits. We'll also add a transaction manager object which examines transactions before they are committed to determine if a serial ordering can still be computed. If there is a single thread performing transactions, then a serial ordering is virtually guaranteed. But if multiple concurrent threads run transactions in parallel, we have to make sure that each transaction runs in isolation from each other, and that the sets of locations read and written do not intersect with other transactions. If the transaction manager finds a conflict, it aborts one of the transactions.

Again, we note that this implementation can easily handle conflicts because it is very simple. Aborting a transaction is easier than committing it. You simply discard the transaction and you don't write a commit record for it. Allowing transaction to proceed and only checking for conflicts at commit time is an ‘optimistic’ strategy. In many situations an ‘optimistic’ concurrency strategy can perform better than a ‘pessimistic’ one. With a slight change it is easy to go to a ‘pessimistic’ model. In a pessimistic model, conflicting transactions are detected when a field is mutated. If the mutated field has been read within another transaction, then the other transaction may be working with stale data if it commits before this one. A pessimitic model is slightly simpler to understand because you check the validity of each operation just before performing it. However, an optimistic model performs very well if concurrent changes to the exact same objects do not often occur.


While trying to bring this code back to life I encountered two bugs in MIT/GNU Scheme. The first was a bug in a rare corner case of structure definition. The second was caused by the fact that the floating point layout changed between the 32 and 64 bit releases. I've fixed these, but I have a more interesting problem.

The MIT Scheme object system, SOS, is derived from many of the ideas in Tiny CLOS. It is missing, however, a fully functional MOP. In addition, method dispatch is done only on the types of arguments. This makes it tricky to do ‘EQL specializers’. EQL specializers are necessary to smoothly integrate object instantiation. You can work around this limitation in some ways, but it would be much better if the implementation had this as a built-in feature.

Friday, December 17, 2010

Transactions, part I

Faré has persuaded me to post actual working and tested code. It seems that enough has changed since I switched computers and the code that was working is working no longer. (The problem has nothing to do with my code — a shared library referred to by Scheme seems to have moved and Scheme came crashing down.) I'll get things working and show the live code. Until then, we'll have to make do with illustrative code fragments.

Given the index structure I described over the past two days, you can easily see that we have solved the problem of internal and external references. We use OIDs as a sort of ‘pointer’, and we modify the access mechanisms to hide the redirection. An analagous modification to make-instance gives us aggregate objects by initializing the instance slots with the appropriate OIDs.

The next thing to deal with is transactions. These are very easy to implement using this design. We have provided no mechanism to rewrite any data in the backing store; it is opened in append-only mode. When we augment the object-map to associate new OIDs with new objects, we construct new object-map nodes and append them to the backing store, but we do not modify any existing object-map nodes. If we were to ignore the new object-map and re-use the old one, we would not be able to access the new objects. The entire object graph would appear to us as if time had stopped at the point when the old object-map was created. This is the key to implementing transactions.

When we begin a transaction, we create a transaction object and copy the most recent object-map root node to it. As we create objects we update the transaction object with augmented object-map. If the transaction fails, we simply discard it. The next transaction is initialized with either the new object-map (if the prior transaction committed) or the old object-map (if the prior transaction aborted). There is no need to ‘undo’ anything because persistent objects cannot be mutated.

When the computer shuts down (either by a crash or planned outage) and is restarted, we use the information in the backing store to restore the state of the program. We need to find the most recent, but valid, root of the object-map. Since we cannot plan crashes, we might find objects and object-map nodes at the tail of the backing store that were written by a transaction that was in progress when the crash happened. We want to ignore these because they represent intermediate state. The trick is this: when a transaction commits we append a special ‘commit record’ to the backing store. The commit record will contain the location (within the store) where the root node of the object-map is stored. When the process restarts, it scans the backing store to find the latest valid commit record and it initializes from there. Any objects that were allocated after that commit record are simply ignored.

If transactions usually commit and the computer rarely crashes, then the most recent commit record will usually be the very last record in the backing store. If the computer crashes a lot, or if transactions are usually aborted, then it is likely that the backing store would accumulate abandoned objects. One could develop a way of reclaiming that storage.

Here is an actual commit record:
((:commit-record-starts)
 (:record-format 0.4)
 (:timestamp 3331389370)
 (:object-map-offset 5971200)
 (:previous-record-offset 5970432)
 (:reason "Update user context.")
 (:commit-record-ends))
The commit record has start and end markers so we can identify incomplete commit records. The object-map is at location 5971200 in the backing store, and the previous commit record is at location 5970432. (For extra safety, the commit record should contain a CRC or hash of the object-map. It should probably contain a CRC or hash of itself in order to make it self-validating. Or you can punt.)

The :reason is just a string. It is not used for anything, but it is extremely important nonetheless. The :reason gives us an audit trail in the event of a catastrophic failure. This is part of the “belt and suspenders” safety mechanism. The backing store may kept for a long time, perhaps even many years. Although we will protect the data with multiple mechanisms to reduce the probability of simultaneous failures, our last line of defense is to manually inspect the backing store records. The :reason field will make this much easier.

Ok, now for some code to illustrate. The actual code has a number of bells and whistles, and I'll show them later, but it is best to start with a stripped down version.
(defconstant *transaction-dispositions*
  '(:running :committing :aborting :committed :aborted))

;; running = transaction should commit upon return,
;;           but abort if a throw occurs
;;
;; aborting = transaction is still running, but it must abort upon exit
;; committing = transaction is still running, but it must commit
;;              upon exit, even if throwing
;;
;; aborted = transaction is over, it aborted
;; committed = transaction is over, it committed
(deftype transaction-disposition () `(member ,@*transaction-dispositions*))

(eval-when (:load-toplevel :compile-toplevel :execute)
  (defvar *current-transaction*)
  (setf (documentation '*current-transaction* 'variable)
        "The current transaction.  Not bound if we are not in a transaction."))

;;; Transactions
(defclass transaction ()
  ((disposition :accessor transaction/disposition
                :initform :running
                :type transaction-disposition)
   (reason      :accessor transaction/reason
                :initarg :reason
                :initform "Unknown"
                :type string)
   (object-map :accessor transaction/object-map
               :initarg  :object-map)))
The code for running a transaction is conceptually simple, but the bells and whistles obscure the simplicity. I'll present a stripped-down version first and then show the full version in a later post.
(defun call-with-transaction (pstore reason receiver)
  (let ((transaction   nil)
        (normal-return nil)
        (abort-reason  "Non-local exit from call-with-transaction."))
      (unwind-protect
          (progn
            ;; This is the only place that transactions are created.
            (setq transaction
                  (make-instance
                   :object-map (persistent-store/object-map pstore)
                   :reason reason))
            (multiple-value-prog1
                (let ((*current-transaction* transaction))
                   (funcall receiver transaction)))
              (setq normal-return t)))

        ;; If the transaction is null, we never even got started.
        (unless (null transaction)
          (ecase (transaction/disposition transaction)
             ;; If we return normally and no one decided to commit
             ;; or abort before now, we commit.
            (:running (if normal-return
                          (transaction/do-commit transaction)
                          (progn
                            (setf (transaction/reason transaction) abort-reason)
                            (transaction/do-abort  transaction))))
            (:committing (transaction/do-commit transaction))
            (:aborting   (transaction/do-abort  transaction))))))

;; Aborting is simple.
(defun transaction/do-abort (transaction)
  "Perform the necessary work to abort the transaction."
  (assert (or (eq (transaction/disposition transaction) :aborting)
              (eq (transaction/disposition transaction) :running)))
  (setf (transaction/disposition transaction) :aborted))

;; Committing is simple, too.  Just write a commit record.
(defun transaction/do-commit (transaction)
  "Perform the necessary work to commit the transaction."
  (assert (or (eq (transaction/disposition transaction) :committing)
              (eq (transaction/disposition transaction) :running)))
   (let* ((object-map (transaction/object-map transaction))
          (pstore (object-map/pstore object-map)))
     (persistent-store/commit pstore
         :object-map object-map
         :reason (transaction/reason transaction))
     (setf (transaction/disposition transaction) :committed)))
More posts to follow...

Wednesday, December 15, 2010

More persistence

In my last post, I described how to map OIDs to persistent objects by means of a functional, weight-balanced, binary tree. In this post, I will describe how to implement aggregate objects. I'm going to start with a high-level view and drill down to the details. I'm afraid at this point I'm going to switch to Common Lisp and CLOS. Don't worry, though, if you just pretend that defclass is a synonym for define-structure, you won't be far off.

Here's a simple CLOS class definition:
(defclass test-class ()
  ((name :initarg :name
         :initform 'zippy)))
Objects of class test-class have a single field which is called name. This class definition adds a method to make-instance that works as follows:
  ;; make an instance where the name is the symbol GRIFFY 
  (defvar test-instance-1 (make-instance 'test-class :name 'griffy))

  ;; alternatively, use the default name of ZIPPY
  (defvar test-instance-2 (make-instance 'test-class))
In addition, we can use slot-value to extract the name:
  (slot-value test-instance-1 'name)
  => GRIFFY

  (slot-value test-instance-2 'name)
  => ZIPPY
And that's that. (And you thought CLOS was complicated.)

A persistent class is defined as follows:
(defclass test-class ()
  ((name :initarg :name
         :initform 'zippy))
  (:metaclass persistent-standard-class)
  (:schema-version 0))
The :metaclass specifier tells CLOS that we want instances to be persistent. The :schema-version is a bit of magic that protects against the unlikely, but possible nasty case that over time a class definition might evolve where a slot name is accidentally re-used. In such a case, the code will need to disambiguate which class definition was intended and it can use the :schema-version. You can safely ignore this by following this simple rule: If you ever modify the slots in a persistent class, you should increment the :schema-version.

So what is different? On the surface, practically nothing changes. The calls to make-instance and slot-value work just as before. There are four changes that happen ‘under the covers’
  1. The object returned by make-instance will be a persistent object.
  2. Two additional slots are allocated. The first contains the OID of the instance itself, the second contains the persistent store where the instance was allocated.
  3. The name slot in the object will contain the OID of the persistent name rather than a Lisp object. make-instance is modified to extract the OIDs of its arguments to initialize this slot.
  4. slot-value must be modified to dereference the OID that is actually in the slot.
These modifications are easily made using the CLOS MOP.

We use the meta-object protocol to override slot-value-using-class as follows:
(defmethod clos:slot-value-using-class ((class persistent-standard-class)
                                        (object persistent-standard-object)
                                        slot)
   (persistent-object/find 
     (persistent-standard-object/pstore object)
     (call-next-method)))
The call to call-next-method invokes the underlying mechanism for reading a slot value. But we're not storing the actual slot value in the slot, we are storing the OID of the actual slot value, so call-next-method will return that OID. We look up the returned OID in the persistent store to get the value. (In order to support multiple persistent stores, we need to know which store contains the value. We require that all subfields of an object be allocated in the same persistent store as the containing object.)

The modifications to make-instance are more complex. We actually don't modify make-instance directly, but instead we modify clos:shared-initialize (which is invoked by make-instance as part of the instance allocation protocol).
;; Override the primary shared-instance method
;; in order to deal with persistent slots.
(defmethod clos:shared-initialize ((instance persistent-standard-object) slot-names
                                   &rest initargs
                                   &key persistent-store oid
                                   &allow-other-keys)
    ;; We have to wrap the initargs and initforms
    ;; in persistent-objects and create an initializer
    ;; for this object.
    (let* ((class (class-of instance))

           (init-plist (compute-persistent-slot-initargs class
                                                         (or persistent-store  *default-persistent-store*)
                                                         initargs))

           (oid (persistent-object/save
                     (make-initializer class
                                       (class-schema-version class)
                                       init-plist)
                     (or persistent-store  *default-persistent-store*)
                     oid)))

      (apply #'call-next-method instance slot-names (nconc init-plist initargs))
      (setf (persistent-standard-object/oid instance) oid)
      instance))

(defun compute-persistent-slot-initargs (class persistent-store initargs)
  "Scan over the persistent effective slots in CLASS,
   determine the value to be assigned to each slot, either
   from the initargs or from the initfunction, then
   using the persistent-initarg as a key, construct a
   plist for use in the persistent initializer and in
   the inner call to shared-initialize."
  (let ((result nil))
    (iterate (((slot-initargs slot-initfunction slot-persistent-initarg)
               (map-fn '(values t t symbol)
                       #'effective-slot-initialization-info
                       (scan-class-persistent-effective-slots class))))
      (let ((initial-value (slot-initial-value initargs slot-initargs slot-initfunction
                                               (clos::slot-unbound-value))))
        (unless (eq initial-value (clos::slot-unbound-value))
          (push slot-persistent-initarg result)
          (push (slot-value->persistent-node persistent-store initial-value) result))))
    (nreverse result)))
In compute-persistent-slot-initargs, you can see the call to slot-value->persistent-node that extracts the OID from the initarg. In shared-initialize, the call to persistent-object/save writes an initialization record to the store and then allocates and initializes the transient, in-memory version of the object.

These are the essential changes to the MOP to implement the basic persistence mechanism.

More details to come in the next post.

Tuesday, December 14, 2010

Oh yeah, about that persistent store....

In April of last year I was describing a simple persistent object store. I never finished. Mea culpa. I was thinking about it again and I realize why. The next step in the implementation is a really big one. We're going to solve five unrelated problems in one fell swoop:
  1. Transactions — we will add the ability to roll-back partially completed operations.
  2. Aggregate objects — up to now, we only allow primitive objects to be persisted. We will add the mechanism to allow aggregate structures to be stored and retrieved.
  3. Side effects — the backing store is opened in append mode, so mutation of existing objects is not possible. Since we only allow primitive immutable objects to be stored, this is not an issue. Once we have aggregate objects, however, we will want to simulate the ability to side-effect these objects. The transaction system will have to know how to undo these effects if the transaction aborts.
  4. Schema migration — Persistent objects live a long time. Over time, it is usually the case that we will want to change what fields are contained in an aggregate object. Removing a field is easy, but adding or renaming an existing field can be tricky because legacy objects will be missing the field. Furthermore, if the aggregate object model is rich (for example, if it supports inheritance), some changes to the model might require making extensive changes to the existing persistent objects.
  5. Internal and external references — up to now, an object cannot refer to another object. There is also no way to refer to a particular persistent object other than by directly using it. For example, although we could store two different strings in the persistent store, and we can recover both, we cannot identify which one is ‘first’ and which one is ‘second’.
I haven't figured a way to truly separate these five different features, so this post is going to be a long and complex one.

We need to introduce object identifiers (OIDs). An OID is an integer that refers to a unique persistent object. Every persistent object in a particular persistent store is given a different OID (we do not require that OIDs are universally unique).

At this point, let me illustrate the code.
(define-structure (object-map-entry
                   (type list)
                   (conc-name object-map-entry/)
                   (constructor %make-object-map-entry
                                (object-id address cached-value)))
  (object-id #f read-only #t)
  (address #f read-only #t)

  ;; *** A slot holding the cached value greatly improves
  ;;     performance!
  (cached-value #f read-only #t)
  )

Each object-map-entry associates an object-id with an address. The address is the location within the backing store where the object has been serialized. The cached-value not only speeds up object retrieval, it ensures that every dereference of an OID yields the same EQ? result. Extrinsic identity — the existence of a universal ‘identically equal’ predicate — is a key assumption of Lisp, so it is important to avoid violating this assumption.

We will be maintaining a data structure that maps OIDs to objects. This data structure will itself need to be persistent. Over time, we expect the ratio of new objects to old objects to decrease, so a data structure that supports incremental update will be much more efficient than one that requires wholesale replacement. A weight-balanced binary tree is one way of implementing this map.

A tree has two kinds of nodes: leaf nodes, which hold a single object-map-entry, and branch nodes, which hold sub-trees in addition to an object-map-entry. Object lookup requires descending into the tree to find the object-map-entry with the correct OID.
(define (persistent-tree/descend object-id current-tree best-tree)
  (cond ((null? current-tree) best-tree)
        ((< object-id 
               (object-map-entry/object-id (persistent-tree/object-map-entry current-tree)))
         (persistent-tree/descend object-id 
                                  (persistent-tree/left-child current-tree) best-tree))
        (else
         (persistent-tree/descend object-id
                                  (persistent-tree/right-child current-tree) current-tree))))

(define (persistent-tree/find-entry root object-id)
  (let ((best-tree (persistent-tree/descend object-id root '())))
    (if (null? best-tree)
        #f
        (let ((entry (persistent-tree/object-map-entry best-tree)))
          (if (< (object-map-entry/object-id entry) object-id)
              #f
              entry)))))
Adding an entry to the OID tree is much more complex. Binary trees work best when they are balanced, so when we add a node to the tree we may want to rebalance the tree. This may require creation of new internal nodes of the tree. In addition, we want our tree to be persistent, so any new nodes must be written to the backing store. However, we want the tree to be efficient — adding a node and rebalancing the tree should not require too much computation and disk traffic.
This particular implementation uses a `functional binary tree'. Updates to the tree do not modify existing data structures. Instead, new nodes are allocated and as much of the existing tree as possible is re-used. (See Stephen Adams `Implementing Sets Efficiently in a Functional Language' http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1134 for the details on the algorithms.) Here is the code:
(define-structure (persistent-tree
                   (conc-name persistent-tree/)
                   (constructor make-persistent-tree (left-child-address
                                                      right-child-address
                                                      %weight
                                                      %address
                                                      left-child
                                                      right-child
                                                      object-map-entry)))

  ;; Address of left child of the tree.
  ;; This field is persistent.
  (left-child-address #f read-only #t)

  ;; Address of right child of the tree.
  ;; This field is persistent.
  (right-child-address #f read-only #t)

  ;; Weight of this tree.
  ;; This field is persistent.
  (%weight #f read-only #t)

  ;; Where the persistent version is in the durable store.
  ;; This field is transient and reconstructed upon deserialization.
  (%address #f read-only #t)

  ;; Cached left child.
  ;; A transient copy of the deserialized left child.
  (left-child #f read-only #t)

  ;; Cached right child.
  ;; A transient copy of the deserialized right child.
  (right-child #f read-only #t)

  ;; The object map entry stored at the root of this tree.
  (object-map-entry #f read-only #t))

(define (persistent-tree/make-leaf durable-store entry)
  (let* ((weight 1)
         ;; Serialize the persistent information from the
         ;; object-map-entry.  (the object-id and the address).
         (address (primitive-serialize-leaf durable-store (object-map-entry/address entry))))
    (make-persistent-tree 0 0 weight address '() '() entry)))

(define (primitive-serialize-leaf durable-store entry-address)
  (write-leaf-record! 
   durable-store
   (lambda (oport)
     ;; store the delta because it is likely to be a small number.
     (write entry-address oport))))

(define (persistent-tree/make-branch durable-store left-child right-child entry)
  (if (and (null? left-child)
           (null? right-child))
      (persistent-tree/make-leaf durable-store entry)
      (let* ((weight (+ 1
                        (persistent-tree/weight left-child)
                        (persistent-tree/weight right-child)))
             (left-child-address (persistent-tree/address left-child))
             (right-child-address (persistent-tree/address right-child))
             ;; Serialize the addresses of the
             ;; left and right child, and the persistent information 
             ;; from the object-map-entry (the object-id and the address).
             (address 
              (primitive-serialize-branch durable-store
                                          left-child-address
                                          right-child-address
                                          (object-map-entry/address entry))))
        (make-persistent-tree left-child-address
                              right-child-address
                              weight
                              address
                              left-child
                              right-child
                              entry))))

(define (primitive-serialize-branch durable-store
                                    left-child-address
                                    right-child-address
                                    entry-address)
  (write-branch-record!
   durable-store
   (lambda (output-port)
     (write left-child-address output-port)
     (write-char #\space output-port)
     (write right-child-address output-port)
     (write-char #\space output-port)
     (write entry-address output-port))))

(define (persistent-tree/add durable-store root new-entry)
  (if (null? root)
      (persistent-tree/make-leaf durable-store new-entry)
      (let ((root-entry (persistent-tree/object-map-entry root))
            (left-child (persistent-tree/left-child root))
            (right-child (persistent-tree/right-child root)))
        (cond ((< (object-map-entry/object-id new-entry)
                  (object-map-entry/object-id root-entry))
               (persistent-tree/t-join 
                durable-store 
                (persistent-tree/add durable-store left-child new-entry)
                right-child
                root-entry))
              ((< (object-map-entry/object-id root-entry)
                  (object-map-entry/object-id new-entry))
               (persistent-tree/t-join
                durable-store 
                left-child
                (persistent-tree/add durable-store right-child new-entry)
                root-entry))
              (else
               (persistent-tree/make-branch durable-store 
                                            left-child right-child
                                            new-entry))))))

(define (persistent-tree/t-join durable-store left-child right-child entry)
  (let ((l.n (persistent-tree/weight left-child))
        (r.n (persistent-tree/weight right-child)))
    (cond ((< (+ l.n r.n) 2)
           (persistent-tree/make-branch durable-store left-child right-child entry))

          ((> r.n (* 5 l.n))
           (persistent-tree/l-join durable-store left-child right-child entry))

          ((> l.n (* 5 r.n))
           (persistent-tree/r-join durable-store left-child right-child entry))

          (else
           (persistent-tree/make-branch durable-store left-child right-child entry)))))

(define (persistent-tree/l-join durable-store left-child right-child entry)
  (if (< (persistent-tree/weight (persistent-tree/left-child right-child))
         (persistent-tree/weight (persistent-tree/right-child right-child)))
      (persistent-tree/single-l durable-store left-child right-child entry)
      (persistent-tree/double-l durable-store left-child right-child entry)))

(define (persistent-tree/single-l durable-store x r entry)
  (persistent-tree/make-branch 
   durable-store 
   (persistent-tree/make-branch durable-store 
                                x (persistent-tree/left-child r) entry)
   (persistent-tree/right-child r)
   (persistent-tree/object-map-entry r)))

(define (persistent-tree/double-l durable-store x r entry)
  (let ((r.l (persistent-tree/left-child r)))
    (persistent-tree/make-branch 
     durable-store
     (persistent-tree/make-branch durable-store
                                  x
                                  (persistent-tree/left-child  r.l)
                                  entry)
     (persistent-tree/make-branch durable-store
                                  (persistent-tree/right-child r.l)
                                  (persistent-tree/right-child r)
                                  (persistent-tree/object-map-entry r))
     (persistent-tree/object-map-entry r.l))))

(define (persistent-tree/r-join durable-store left-child right-child entry)
  (if (< (persistent-tree/weight (persistent-tree/right-child left-child))
         (persistent-tree/weight (persistent-tree/left-child left-child)))
      (persistent-tree/single-r durable-store left-child right-child entry)
      (persistent-tree/double-r durable-store left-child right-child entry)))

(define (persistent-tree/single-r durable-store l z entry)
  (persistent-tree/make-branch
   durable-store
   (persistent-tree/left-child l)
   (persistent-tree/make-branch durable-store
                                (persistent-tree/right-child l)
                                z
                                entry)
   (persistent-tree/object-map-entry l)))

(define (persistent-tree/double-r durable-store l z entry)
  (let ((l.r (persistent-tree/right-child  l)))
    (persistent-tree/make-branch 
     durable-store
     (persistent-tree/make-branch durable-store
                                  (persistent-tree/left-child  l)
                                  (persistent-tree/left-child  l.r)
                                  (persistent-tree/object-map-entry l))
     (persistent-tree/make-branch durable-store
                                  (persistent-tree/right-child l.r)
                                  z
                                  entry)
     (persistent-tree/object-map-entry l.r))))


In the next post, I will describe how we use this structure to implement transactions, simulate side-effects, and turn schema migration into a non-issue.

Monday, October 18, 2010

No ILC for me

Alas, I'm unable to attend ILC 2010.

Thursday, September 2, 2010

For your amusement...

Some computer Tom Swifties:
  • “I thought I would free up more memory on the second pass,” Tom recollected.
  • “You should always check the pre-conditions,” Tom asserted.
  • “Did you use SSL?” Tom asked cryptically.
  • “If one of those is smaller, put it first,” Tom ordered weakly.
  • “Command continuations discard values,” Tom said expressionlessly.
  • “That isn't null or a cons cell,” Tom said listlessly.
  • “Does this return an answer?” Tom asked haltingly.
  • “This program implicitly depends upon time,” Tom stated.
  • “Maybe it's in Reverse Polish Notation,” Tom put forth.

Friday, August 20, 2010

INTERNATIONAL LISP CONFERENCE 2010 - HIGHLIGHTS and CALL for PAPERS

With the usual apologies to those who receive multiple copies of this...

 INTERNATIONAL LISP CONFERENCE 2010 - HIGHLIGHTS and CALL for PAPERS 

Important Dates:
~~~~~~~~~~~~~~~~

 * Deadline for all submissions (FIRM): September 6, 2010
 * Early registration: September 16, 2010
 * Author notification: September 20, 2010
 * Final paper due (in electronic form): October 5, 2010
 * Conference: October 19-21, 2010


Invited Speakers:
~~~~~~~~~~~~~~~~~

We are proud to announce that, for the 2010 edition, we will have the
following invited talks:

 * Lawrence Hunter
     Building a Mind for Life

 * Jans Aasman
     AllegroGraph and the Linked Open Data Cloud

 * Marc Feeley
     Gambit Scheme: Inside Out

 * Peter Seibel
     Common Lisp Standardization: The Good, the Bad, and the Ugly

 * Don Syme
     F#: Taking Succinct, Efficient, Typed Functional Programming 
     into the Mainstream

 * Lowel Hawkinson
     Lisp for Breakthrough Products

More information about speakers and talks is available at
http://www.international-lisp-conference.org/2010/speakers


Registration:
~~~~~~~~~~~~~

Rates for Early Registration (before or on September 16, 2010)

ALU member
 & ACM member      $355
 & ACM non member  $390
 & Student         $150

ALU Individual Sponsors
 & ACM member      $280
 & ACM non-member  $315
 & Student         n/a

non-member of ALU
 & ACM member      $430
 & ACM non-member  $465
 & Student         $165

Rates for Late Registration (after September 16, 2010 & Onsite)

ALU member
 & ACM member      $440
 & ACM non member  $485
 & Student         $185

ALU Individual Sponsors
 & ACM member      $365
 & ACM non-member  $410
 & Student         n/a

non-member of ALU
 & ACM member      $515
 & ACM non-member  $560
 & Student         $200

Due to colocation, registration must be done using ILC/SPLASH'10
unified registration forms available at http://splashcon.org/

Please note that the registration page (page 3) has the option "SPLASH
(OOPSLA/Onward!)" selected by default.  If you are only planning to
attend ILC, don't forget to deselect that option.


Travel and Accommodation:
~~~~~~~~~~~~~~~~~~~~~~~~~

SouthWest Airlines offers low fares into Reno but requires booking
online at www.southwest.com

John Ascuaga's Nugget offers reduced rates for ILC participants.
Please, visit http://splashcon.org/ to obtain the group code.


Scope:
~~~~~~

Lisp is one of the greatest ideas from computer science and a major
influence for almost all programming languages and for all
sufficiently complex software applications.

The International Lisp Conference is a forum for the discussion of
Lisp and, in particular, the design, implementation and application of
any of the Lisp dialects.  We encourage everyone interested in Lisp to
participate.

We invite high quality submissions in all areas involving Lisp
dialects and any other languages in the Lisp family, including, but
not limited to, ACL2, AutoLisp, Clojure, Common Lisp, ECMAScript,
Dylan, Emacs Lisp, ISLISP, Racket, Scheme, etc.

Topics may include any and all combinations of Lisp and:

 * Language design and implementation
 * Language critique
 * Language integration, inter-operation and deployment
 * Applications (especially commercial)
 * 'Pearls' (of wisdom)
 * Experience reports and case studies
 * Reflection, meta-object protocols, meta-programming
 * Domain-specific languages
 * Programming paradigms and environments
 * Parallel and distributed computing
 * Software evolution
 * Theorem proving
 * Scientific computing
 * Data mining
 * Semantic web

We also encourage submissions about known ideas as long as they are
presented in a new setting and/or in a highly elegant way.

Authors concerned about the appropriateness of a topic may communicate
by electronic mail with the program chair prior to submission.

Accepted papers will be published in the ACM Digital Library (PENDING).

Papers must be written in English and submitted electronically at
http://www.easychair.org/conferences?conf=ilc2010 in PDF or WORD
format.  Final submissions must not exceed 15 pages and need to use
the ACM format, for which templates can be found at:
http://www.acm.org/sigs/pubs/proceed/template.html.

Each paper should explain its contributions in both general and
technical terms, identifying what has been accomplished, explaining
why it is significant, and comparing it with previous work. Authors
should strive to make their papers understandable to a broad audience.
Each paper will be judged according to its significance, novelty,
correctness, clarity, and elegance.

The official language of the conference is English.  Some further
information is available at the conference web site, with more details
added later.  See: http://www.international-lisp-conference.org/

Technical Program:
~~~~~~~~~~~~~~~~~~

Original submissions in all areas related to the conference themes are
invited for the following categories.

 * Papers: Technical papers of up to 15 pages that describe original
   results or explain known ideas in new and elegant ways.

 * Demonstrations: Abstracts of up to 4 pages for demonstrations of
   tools, libraries, and applications.

 * Tutorials: Abstracts of up to 4 pages for in-depth presentations
   about topics of special interest for at least 90 minutes and up to
   180 minutes.

 * Workshops: Abstracts of up to 4 pages for groups of people who
   intend to work on a focused topic for half a day.

 * Panel discussions: Abstracts of up to 4 pages for discussions about
   current themes. Panel discussion proposals must mention panel
   members who are willing to partake in a discussion.

 * Lightning talks: Abstracts of up to one page for talks to last for
   no more than 5 minutes.

Depending on the technical content, each submitted paper will be
classified by the program committee as either a technical paper or as
an experience paper; and authors will be informed about this
classification.  Note that all interesting submissions are considered
valuable contributions to the success of the ILC series of
conferences.  As in past ILC's since 2007, accepted papers in both
categories will be presented at the conference, included in the
proceedings, and submitted to the ACM digital library.


Organizing Committee:
~~~~~~~~~~~~~~~~~~~~~

 * General Chair:
   JonL White - The Ginger IceCream Factory of Palo Alto, ALU

 * Program Chair:
   Antonio Leitao - Instituto Superior Tecnico/INESC-ID

 * Conference Treasurer:
   Duane Rettig - Franz, Inc., ALU Director

 * Publicity Chair:
   Daniel Herring - ALU Director

 * ALU Treasurer:
   Rusty Johnson - TASC, Inc., ALU Director


Program Committee:
~~~~~~~~~~~~~~~~~~

 * Antonio Leitao - Instituto Superior Tecnico/INESC-ID, Portugal
 * Alex Fukunaga - University of Tokyo, Japan
 * Charlotte Herzeel - Vrije Universiteit Brussel, Belgium
 * Christophe Rhodes - Goldsmiths College, University of London, UK
 * Didier Verna - EPITA Research and Development Laboratory, France
 * Duane Rettig - Franz, Inc., USA
 * Giuseppe Attardi - University of Pisa, Italy
 * Jeff Shrager - Symbolic Systems Program, Stanford University, USA
 * Joe Marshall - Google, Inc., USA
 * Julian Padget - University of Bath, UK
 * Keith Corbett - Clozure Associates, USA
 * Kent Pitman - PTC, USA
 * Manuel Serrano - INRIA Sophia Antipolis, France
 * Marc Feeley - University of Montreal, Canada
 * Marie Beurton-Aimar University of Bordeaux 1, France
 * Mark Stickel - SRI International, USA
 * Matthias Felleisen - Northeastern University, USA
 * Scott McKay - ITA Software, USA


Contacts:
~~~~~~~~~

 * Questions: ilc10-organizing-committee at alu.org

 * Program Chair: ilc2010 at easychair.org

For more information, see http://www.international-lisp-conference.org/

Wednesday, August 18, 2010

P ≠ NP (part 2 correction)

In the previous post I said that one example of an NP problem is “Check if two computer programs always produce the same output.” jkff commented:
The "check if two computer programs always produce the same output" is not NP. It's simply undecidable and cannot be solved in any amount of time.
I blew it and oversimplified the statement. At Wikipedia, they have a list of commonly known NP-complete problems and on that list is “Inequivalence of simple functions.” I seriously mangled the meaning because I was daydreaming about assembly code. What I should have done is put some restrictions on what sort of computer program I meant.

First, we have to restrict ourselves to programs that we can prove will always return an answer within a given amount of time. A computer program that does not have any loops, backward jumps (gotos), subroutine calls, or exceptions should simply run from from start to finish. Second, we do not allow the program to save any state between runs; the output must be a pure function of the input. Furthermore, we restrict the input to a finite set (for example, integers that expressible in a single 32-bit word). Now suppose we have two such programs. To prove they are not equivalent, we need to find a 32-bit integer input that makes each program give a different answer. By trial and error we will eventually find such an input or exhaustively show that there is none. However, if someone were to claim that a particular input produced different output, then it is easy to check by simply trying it out.

Tuesday, August 17, 2010

P ≠ NP (part 2)

As I mentioned in my last post, P and NP are commonly encountered complexity classes. Knowing what complexity class a problem is in can give us some idea of how difficult it may be to solve the problem. So what's the difference between P and NP? I don't want to get into the technical definition of P and NP, so I'm going to make broad generalizations that aren't technically rigorous, but will give you a feel for the difference.

Problems in class P are generally considered “tractable” by computer scientists. By this we mean that a computer can probably solve such a problem efficiently and relatively quickly. Recall that the complexity classes relate the ‘number of pieces’ to the difficulty. If you can solve a particular problem in P, then a similiar problem with a ‘larger number of pieces’ is likely within your grasp (and if not, it is usually a question of upgrading to a better computer). Problems in class P are the ‘meat and potatoes’ of computer programs.

Before I discuss NP, let me introduce the complexity class EXPTIME. EXPTIME is where you find some really hard problems, like computer chess or Go. It is true that there are computers that can play chess really well, but if we were to modify the rules of chess to add a couple of new pieces and make the board 9x9 instead of 8x8, it would greatly increase the difficulty of the game. Problems in EXPTIME are generally considered “intractable”. By this we mean that a computer will have a hard time solving such a problem efficiently and quickly. And if you are able to solve a particular problem in EXPTIME, then a similar problem with just ‘a few more pieces’ is likely beyond your grasp, no matter how fancy a computer you might buy. Problems in class EXPTIME take a lot of time and money to solve.

So what about NP? Problems in NP are quite curious. They seem to be very difficult to solve, much like EXPTIME problems, but they have the unusual property that it is very easy to check if a solution is correct. Let me illustrate: if I showed you a chess position and claimed that it was a good position for white, you'd have to do a lot of work to verify whether my claim was true. In fact, it would have to be about the same amount of work it took me to come up with the position in the first place. On the other hand, if I were to show you a jigsaw puzzle and claim that I had finished it, you could tell at a glance whether my claim were true. Problems in NP seem to be much harder to solve than problems in P, but as easy to verify as problems in P. That is a little weird.

Problems in NP are often encountered in computer programs, and many of these kind of problems, although very difficult to solve, have approximations that are relatively easy. In some cases, when a perfect solution is not needed, one that is ‘good enough’ will be a lot easier to compute. Another weird thing is that a lot of problems in NP don't sound at all like they'd be hard to solve. Here are some examples:
  • Take a group of people and divide them into two subgroups such that both subgroups have exactly the same amount of change in their pockets. (No, you aren't allowed to move the change around.)
  • Find the shortest path that visits all the major landmarks in a city.
  • Check if two computer programs always produce the same output.  Whoops!  I blew this one.  Check the next posting.
There is one final bit of weirdness. It is easy to prove that EXPTIME problems are much harder to solve that P problems, but no one has proven that NP problems are harder than P problems. (or that they aren't harder than P problems!) This isn't for lack of trying. Many smart people have worked on a proof for some time. There's a million dollar prize for the first person to prove this one way or the other. Recently, Vinay Deolalikar of HP Labs claimed that he has a proof. Unfortunately, other experts in the field have pointed out flaws in the proof that may invalidate it.

Friday, August 13, 2010

P ≠ NP (part 1)

I'm sure most people who read this blog know a bit about what this means, but I want to try to explain it in a way my mom could understand. I won't go into excruciating detail, and I'll probably gloss over interesting points.

Let's start with an analogy. Suppose you go to the store to buy a present for your nephew. He likes jigsaw puzzles, so you want to get him one. He's going to be twelve years old, so you need to find one that has the appropriate challenge. A simple one with four pieces might be great for a two year-old, but your nephew would find that boring. A big one with ten thousand pieces is probably too much for him. You know he just finished one with five hundred pieces, so you pick one with six hundred pieces because it won't be too easy, but not too frustrating. What you are doing is estimating the complexity of the puzzle.

When you get to the store you find that they are sold out of jigsaw puzzles. They have a variety of other puzzles, though. They have a Rubik's cube, a Sudoku book, some cryptograms, a Tower of Hanoi, etc. They have models you can put together as well. But these things aren't jigsaw puzzles. You can't estimate the complexity the same way. A model with five hundred pieces would be quite a bit more difficult to assemble than a jigsaw puzzle with the same number of pieces.

Perhaps you look at the age ratings for the various puzzles. The eight-piece Tower of Hanoi puzzle is rated for ages ten and up, so maybe a twelve piece? The standard 3x3x3 Rubik's cube is rated for ages eight and up, but would the 4x4x4 be too hard or too easy? What about the 5x5x5?

The kind of rating system you use to describe the difficulty of a puzzle is analogous to the “complexity class” of a problem. The complexity class roughly describes how hard a problem will be to solve, and more importantly, how much harder ‘big’ problems are compared to the ‘small’ ones. Here's what I mean: if I take a jigsaw puzzle, any jigsaw puzzle, and double the number of pieces in it, I'll make it roughly twice as hard (more or less). If I take a cryptogram and double the number of words in it, I'll actually make it a lot easier. If I take the Tower of Hanoi and double the number of discs, I'll make it incredibly harder, maybe even impossible. These puzzles are in different complexity classes.

As it turns out, a lot of puzzles are in the same complexity class as the jigsaw puzzle: the difficulty is reasonably proportional to the number of pieces. Adding a piece or two doesn't change things that much, and it is easy to figure out how long it will take to solve if you've done a few of them. A lot of puzzles are more like the Tower of Hanoi. Adding a single piece will double the amount of time it takes to solve them.

P and NP are complexity classes for problems you might want to solve on a computer. There are all sorts of rating systems, but everyone is familiar with the MPAA movie ratings. There are all sorts of complexity classes, but most computer scientists are familiar with P and NP.

I'll pause here because the analogy is wearing thin. The main point is that “P ≠ NP” is a statement about complexity classes. In particular, it talks about two of the most popular complexity classes and that is part of the reason it is interesting.

Thursday, August 12, 2010

Rambling

Writing a blog is hard. I need practice so I'm going to ramble a bit.

First, I'm very pleased that Blogger has improved their comment spam detection. Every time I made a post I'd have to come back a day or two later and remove a bunch of pointless comment spam.

Second, I told my daughter that someone claims to have a proof that P ≠ NP. She replied “Well, now we know that N doesn't equal 1.”

I read through the proof and I confess that I don't get it. On the other hand, I think I want to get it, so I'll have to learn some interesting things.

Here's another thing I don't get.
`` E8 is any of several closely related exceptional simple Lie groups and Lie algebras of dimension 248''

``a simple Lie group is a connected non-abelian Lie group G which does not have nontrivial connected normal subgroups.''

``a Lie group is a group which is also a differentiable manifold, with the property that the group operations are compatible with the smooth structure.''

``a non-abelian group is a group (G, * ) in which there are at least two elements a and b of G such that a * b ≠ b * a.''

``a connected space is a topological space which cannot be represented as the union of two or more disjoint nonempty open subsets''

Thursday, August 5, 2010

Tuesday, August 3, 2010

My definition

Here's my definition:
A computer program is a description of a process which is formal enough to be carried out by a machine.
Most of the other definitions describe a program as a ‘set of instructions’. Some of the definitions suggest that these instructions should be organized in some way, perhaps as a (linear) list. These instructions ‘make the computer do things’, ‘bring about a certain result’, ‘cause the computer to behave in a predetermined manner’, or ‘alter the contents of memory’. But these definitions have an explicit or implicit assumption: a sequential, imperative mode of thought.

Look at this Prolog code:
append(c(H,T), B, c(H,TB)) <= append(T, B, TB).
append(nil, B, B).
This code describes relationship of appending lists, but it doesn't specify how to accomplish the appending. Is the first clause an ‘instruction’ to build a list, or to take one apart? Are the clauses to be run in any particular order? Is there a deterministic ‘answer’?

Monday, August 2, 2010

What is a ‘computer program’?

This seems like a simple enough question.

Here's what the web says. A computer program is
  1. a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result
  2. what makes the computer do things
  3. a set of instructions written in a programming language
  4. an algorithm written in a language that a computer can understand
  5. simply a long list of code written in another language
  6. a set of instructions for altering the contents of various words in the computer's memory
  7. a carefully thought out, step-by-step, set of instructions prepared by a programmer
  8. an organized list of instructions that, when executed, causes the computer to behave in a predetermined manner
  9. is a procedure (or algorithm) for performing some computation
  10. a bunch of instructions run by a computer, just like a storybook is just a whole bunch of sentences read by the reader

Does anyone have any better definitions?

Tuesday, July 13, 2010

Crazy person thinks ``Lisp sucks''

In this post, a person going by the initials CC declares:
There's a good reason almost no software has been written in lisp. Lisp sucks. Not just lisp, but the whole way the logic is arranged as nested things. The simplest problem becomes a mind-bending puzzle.
Instead of easy to understand:
if (a equals b) { 
  c = not d; 
} else { 
  c = d + b; 
}
You have this nested inner to outer logic where if is a function.
c = if(equals(a,b), not(d), plus(d,b)) 
The first is definitely more readible [sic] and it reads like we think of the logic. No one thinks of if as a function that takes three arguments, and forcing you to work that way just makes you crazy.
I suppose it is futile to argue with an admittedly crazy person, but perhaps being forced to work with conditionals has made me crazy as well. A closer translation of the original code would be this:
(if (equals a b)
    (setq c (not d))
    (setq c (+ d b)))
If these sorts of differences are a ``mind-bending puzzle'', perhaps the author should consider a career in the housekeeping or food-service industries.

Thursday, June 24, 2010

Dr Scheme name change

I put together a page that compares the various names of Dr. Scheme by how often people search for them in Google.

Friday, June 11, 2010

Buoyancy

Steve suggested: For this to work like you want, Java style, you'd have to add a Class member to the Box class, to represent the type of Box at runtime.

Like this:
public final class Box<E> {

  private final Class<E> type;
  private final E value;

  public Box (Class<E> type, E value) {
    this.type = checkNotNull(type, "type");
    this.value = value;
  }

  public E getValue() {
    return value;
  }

  public boolean isType (Class<?> specific) {
    return specific == this.type;
  }

  @SuppressWarnings("unchecked")
  public <X> Box<X> asType (Class<X> specific) {
    if (isType(specific)) {
      return (Box) this;
    } else {
      throw new ClassCastException();
    }
  }
}
You could then write a static generic filter taking an Iterable of generic Boxes and a Class object to filter by, and returning an Iterable of the correct type fairly simply.

Sort of like this (assuming appropriate definitions of map and filter):
// The essential code is in a bold green face.

public <X> Iterable<Box<X>> boxesOfType (final Class<X> type,
                                         Iterable<Box<?>> boxes) {
  return
      map (
          filter (
              boxes,
              new Predicate<Box<?>> () {
                @Override
                public boolean apply (Box<?> box) {
                  return box.isType(type);
                }
              }),
          new Function<Box<?>, Box<X>> () {
            @Override
            public Box<X> apply (Box<?> box) {
              return box.asType(type);
            }
          });
}
Dunno if that floats your boat, tho.

It definitely floats my boat (thanks, Steve) in that I can now, in a typesafe manner, take a bunch of heterogeneous boxes and select and downcast them to a subset of homogeneous boxes.

I'll discuss the leaks in the boat in the next post.

Addendum

mquander offered a C# solution to the same problem:
internal interface IBox { }
 
public class Box<T> : IBox
{
    public T Value;
}
 
public static class BoxManipulators
{
    public static int MinimumIntegerInBox(IEnumerable<IBox> boxes)
    {
        return boxes.OfType<Box<int>>().Min(x => x.Value);
    }
}

Thursday, June 10, 2010

An Illustrative Problem (Part II) correction

Blogger really munged Mike's code, so here is the relevant part correctly formatted:
public static int minimumIntegerInBox(Collection<Box<?>> boxes) {
  Iterable<Box<?>> intBoxes = filter(boxes,
      new Function<Box<?>, Boolean>() {
        @Override
        public Boolean of(Box<?> domainItem) {
          return domainItem.get() instanceof Integer;
        }
      });
  Iterable<Integer> ints = map(
      intBoxes,
      new Function<Box<?>, Integer>() {
        @Override
        public Integer of(Box<?> domainItem) {
          return Integer.class.cast(domainItem.get());
        }
      });
  return minimumElement(ints);
}

public static int minimumElement(Iterable<Integer> items) {
  return best(
      new Function<Pair<Integer, Integer>, Boolean>() {
        @Override
        public Boolean of(Pair<Integer, Integer> domainItem) {
          return domainItem.second() < domainItem.first();
        }
      },
      items);
}
This is similar to what I started with, but I didn't like this solution for two reasons.
  1. The test for whether the Box<?> is a Box<Integer> relies on extracting the contents of the Box and testing to see if it is an integer. In other words, rather than relying on the type of the container, we explicitly look at the contents of the container. This is an important distinction, but it might be hard to see it in this context. Just because the box happens to contain an integer at this point doesn't mean that it is a Box<Integer>. It could be a Box<Number> or a Box<Object> that currently has an integer in it, but could just as well have some non-integer in it. Conversely, it could be a Box<Integer> that happens to have a null in it. For the purposes of finding the minimum element, a null would be an issue, but if the task were to “fill all the Box<Integer> with zero,” then this method of testing the box type would miss the Box<Integer> with null.

  2. The type of intBoxes is wrong. It really ought to be Iterable<Box<Integer>>.

Anyone care to offer a fix?

Wednesday, June 9, 2010

An illustrative problem (part II)

Mike Coffin took the bait. (Thanks, Mike!) Unfortunately, pasting code into the blog comments is a very poorly supported feature. Blogs about Python must be interesting. I'll recopy some of Mike's code here.

Mike said: While I prefer this:
int minimumIntegerInBox (Collection<Box> boxes) {
    int result = Integer.MAX_VALUE;
    for (Box box : boxes) {
        if (box.get() instanceof Integer) {
            result = Math.min (result, Integer.class.cast (box.get()));
        }
    }
    return result;
}
that would be cheating.

It isn't cheating, but it isn't answering the question I posed, either. It's more or less the obvious thing to do in Java or Lisp. But conceptually it is pretty primitive. I don't care about the machinery involved in traversing a list (the looping construct). I want to take a binary operation like Math.min and make it n-ary.

Mike gave a very complete solution, but I'm only going to reformat the relevant part:
public static int minimumIntegerInBox(Collection<Box> boxes) {
    Iterable<Box> intBoxes = 
        filter (boxes,
                new Function<Box, Boolean>() {
                    @Override
                    public Boolean of(Box domainItem) {
                        return domainItem.get() instanceof Integer;
                    }
                });

    Iterable ints = 
        map (intBoxes,
             new Function<Box, Integer>() {
                 @Override
                 public Integer of(Box domainItem) {
                     return Integer.class.cast(domainItem.get());
                 }
             });

    return minimumElement(ints);
}

public static int minimumElement(Iterable items) {
    return best(
        new Function<Pair, Boolean>() {
            @Override
            public Boolean of(Pair domainItem) {
                return domainItem.second() < domainItem.first();
            }
        },
        items);
}
That's a lot closer to what I had in mind, but these are untyped Boxes. I want to use parameterized Boxes.

Tuesday, June 8, 2010

An illustrative problem.

I'm working on a fairly simple problem and I think it illustrates something interesting. Here is the problem statement:

A `box' is an object that has a name and can hold another object of a particular type (that is, there are boxes for integers, boxes for strings, boxes for floats, etc.) You are given an unordered collection of various boxes and must find the smallest integer among the subset of boxes that contain integers.

To make it a tiny bit more challenging, we won't just write an ad-hoc loop. We'll use map, filter, and fold-left. Since fold-left is a bit scary, I'll write that part:
(define (best better? list)
  (fold-left (lambda (best-so-far candidate)
               (if (or (null? best-so-far)
                       (better? candidate best-so-far))
                   candidate
                 best-so-far))
             '()
             list))

(define (minimum-element list)
  (best < list))
So assuming that *box-list* is a variable containing our list of boxes, Exercise 1 is to write a program in Scheme or Lisp using map, filter, and minimum-element that finds the smallest integer (fixnum) in the boxes.

Too easy? Let's make it a bit trickier: Code up the identical solution in Java.

Assume that Box is a generic (that is, parameterized) type, so Box<Integer> would contain an Integer and Box<String> contains a String, etc. The variable BoxList would be declared as Collection<Box<?>>, so we want a method with this signature: int minimumIntegerInBox (Collection<Box<?>> boxes)

I don't think it can be done without getting at least one Unchecked conversion warning, but a cleverly placed @SuppressWarnings("unchecked") will permit you to downcast a Box<?> to a specific type, and then everything else should type check.

Monday, June 7, 2010

International Lisp Conference 2010 - Call for Papers

***********************************************************************
*                                                                     *
*                  International Lisp Conference 2010                 *
*                         October 19-21, 2010                         *
*                    John Ascuaga's Nugget (Casino)                   *
*              Reno/Sparks, Nevada, USA (near Lake Tahoe)             *
*                                                                     *
*           Collocated with SPLASH 2010 (OOPSLA & DLS & more)         *
*               see also http://splashcon.org as well as              *
*          http://www.dynamic-languages-symposium.org/dls-10/         *
*                                                                     *
*              In association with ACM SIGPLAN (PENDING)              *
*                                                                     *
*********************************************************************** 


The Association of Lisp Users is pleased to announce that the 2010
International Lisp Conference will be held in Reno, Nevada, in
collocation with SPLASH 2010.  The scope includes all areas related to
the Lisp family of programming languages.

Accepted papers will be published in the ACM Digital Library (PENDING).

Extended Abstracts and Papers must be written in English and submitted
electronically at http://www.easychair.org/conferences?conf=ilc2010 in
PDF or WORD format. If an Extended Abstract is submitted, it must be 
between 2 and 4 pages, with full paper to follow before final deadline.

Final submissions must not exceed 15 pages and need to use the ACM 
format, for which templates which can be found at:
    http://www.acm.org/sigs/pubs/proceed/template.html.


Important Dates:
****************

* Deadline for Abstract Submission      August      1, 2010
* Deadline for Paper Submission         September   6, 2010
* Author notification                   September  20, 2010
* Final paper due (in electronic form)  October     5, 2010
* Conference                            October 19-21, 2010


Scope:
******

Lisp is one of the greatest ideas from computer science and a major
influence for almost all programming languages and for all
sufficiently complex software applications.

The International Lisp Conference is a forum for the discussion of
Lisp and, in particular, the design, implementation and application of
any of the Lisp dialects.  We encourage everyone interested in Lisp to
participate.

We invite high quality submissions in all areas involving Lisp
dialects and any other languages in the Lisp family, including, but
not limited to, ACL2, AutoLisp, Clojure, Common Lisp, ECMAScript,
Dylan, Emacs Lisp, ISLISP, Racket, Scheme, etc.

Topics may include any and all combinations of Lisp and:

* Language design and implementation
* Language critique
* Language integration, inter-operation and deployment
* Applications (especially commercial)
* 'Pearls' (of wisdom)
* Experience reports and case studies
* Reflection, meta-object protocols, meta-programming
* Domain-specific languages
* Programming paradigms and environments
* Parallel and distributed computing
* Software evolution
* Theorem proving
* Scientific computing
* Data mining
* Semantic web

We also encourage submissions about known ideas as long as they are
presented in a new setting and/or in a highly elegant way.

Authors concerned about the appropriateness of a topic may communicate
by electronic mail with the program chair prior to submission.

Each paper should explain its contributions in both general and
technical terms, identifying what has been accomplished, explaining
why it is significant, and comparing it with previous work. Authors
should strive to make their papers understandable to a broad audience.
Each paper will be judged according to its significance, novelty,
correctness, clarity, and elegance.

The official language of the conference is English.  Some further
information is available at the conference web site, with more details
added later.  See: http://www.international-lisp-conference.org

Technical Program:
******************

Original submissions in all areas related to the conference themes are
invited for the following categories.

* Papers: Technical papers of up to 15 pages that describe original
   results or explain known ideas in new and elegant ways, or extended
   abstracts of 4 pages soon followed by the corresponding full paper.

* Demonstrations: Abstracts of up to 4 pages for demonstrations of
   tools, libraries, and applications.

* Tutorials: Abstracts of up to 4 pages for in-depth presentations
   about topics of special interest for at least 90 minutes and up to
   180 minutes. 

* Workshops: Abstracts of up to 4 pages for groups of people who
   intend to work on a focused topic for half a day.

* Panel discussions: Abstracts of up to 4 pages for discussions about
   current themes. Panel discussion proposals must mention panel
   members who are willing to partake in a discussion.

* Lightning talks: Abstracts of up to one page for talks to last for
   no more than 5 minutes.

Depending on the technical content, each submitted paper will be
classified by the program committee as either a technical paper or as
an experience paper; and authors will be informed about this
classification.  Note that all interesting submissions are considered
valuable contributions to the success of the ILC series of
conferences.  As in past ILC's since 2007, accepted papers in both
categories will be presented at the conference, included in the
proceedings, and submitted to the ACM digital library.


Organizing Committee:
*********************

* General Chair:
   JonL White          The Ginger IceCream Factory of Palo Alto, ALU

* Program Chair:
   Antonio Leitao      Instituto Superior Tecnico/INESC-ID

* Conference Treasurer:
   Duane Rettig        Franz, Inc., ALU Director

* Publicity Chair:
   Daniel Herring      ALU Director

* ALU Treasurer:
   Rusty Johnson       TASC, Inc., ALU Director


Program Committee:
******************

* Antonio Leitao      Instituto Superior Tecnico/INESC-ID, Portugal
* Alex Fukunaga       University of Tokyo, Japan
* Charlotte Herzeel   Vrije Universiteit Brussel, Belgium
* Christophe Rhodes   Goldsmiths College, University of London, UK
* Didier Verna        EPITA Research and Development Laboratory, France
* Duane Rettig        Franz, Inc., USA
* Giuseppe Attardi    University of Pisa, Italy
* Jeff Shrager        Symbolic Systems Program, Stanford University, USA
* Joe Marshall        Google, Inc., USA
* Julian Padget       University of Bath, UK
* Keith Corbet        Clozure Associates, USA
* Kent Pitman         PTC, USA
* Manuel Serrano      INRIA Sophia Antipolis, France
* Marc Feeley         University of  Montreal, Canada
* Marie Beurton-Aimar University of Bordeaux 1, France
* Mark Stickel        SRI International, USA
* Matthias Felleisen  Northeastern University, USA
* Scott McKay         ITA Software, USA


Contacts:
*********

* Questions:         ilc10-organizing-committee at alu.org

* Program Chair:     ilc2010 at easychair.org

For more information, see http://www.international-lisp-conference.org

Friday, June 4, 2010

It was quiet..... too quiet.....

Things seemed a bit quiet here until I looked at some older posts and discovered comments I hadn't seen before. It seems that gmail had gotten into its head that forwarded posts were spam. Wonderful. My apologies for not responding to people.

Choose and perish

I chose ok...

(I had to retouch the buttons because the camera overexposed them.)

Wednesday, May 26, 2010

I have two collections and I need to see if one of them is a representation of the other. Since each kind of object has an Id, the two sets of Ids should match, but since collections are unordered, you can't just do a pairwise match.

The solution is to use the containsAll method to see if each collection contains all the elements of the other one. In Scheme this is trivial:
(let ((foo-ids (map foo-id (get-foos)))
      (bar-ids (map bar-id (get-bars))))
  (and (contains-all foo-ids bar-ids)
       (contains-all bar-ids foo-ids)))
in Java, not so easy:
Collection<Id> fooIds =
    Collections2.transform(getFoos(),
                           new Function<Foo, Id> () {
                             @Override
                             public Id apply (Foo foo) {
                               return foo.getId();
                             }
                           });
Collection<Id> barIds =
    Collections2.transform(getBars(),
                           new Function<Bar, Id> () {
                             @Override
                             public Id apply (Bar bar) {
                               return bar.getId();
                             }
                           });
return fooIds.containsAll(barIds) &&
       barIds.containsAll(fooIds);
I know that this is not Java's forte, but this isn't some obscure stratified monadic combinator, it's just map for heaven's sake! If future maintainers of this code can't handle map, maybe they should consider a career in the fast food industry.

Tuesday, May 25, 2010

Pathetic

Collection expectedIds =
    Collections2.transform(fooCollection,
                           new Function () {
                             @Override
                             public Id apply (Foo foo) {
                               return foo.getId();
                             }
                          });

Looking forward to ILC 2010

Daniel Herring sends this announcement:

The ALU is pleased to announce that the International Lisp Conference 2010 (ILC-2010) will be held mid October in Reno, Nevada.

ILC-2010 will be colocated with OOPSLA/SPLASH.

Colocated conferences allow people to explore other sessions; but the ILC itself will be the same independent, lisp-intensive event you've grown to love.

More announcements, including the ILC conference website and call for papers, will appear in the coming weeks.

Sincerely,
p.p. Daniel Herring
ALU Board of Directors



I'm planning on being there. Anybody want to write a paper?

Tuesday, May 11, 2010

Doing it wrong

I'm hacking some Java code and I'm going to do a query and return some objects. Since I have no need to mutate the collection, I decide to use an immutable list. To make a long story short, I did this:
import com.foo.bar.ImmutableList;

final ImmutableList.Builder<Object> intermediateCollection = ImmutableList.builder();
    ...
    if (fromObjectClass.isInstance(object))
       intermediateCollection.add(object);
    ...
    query (intermediateCollection.build())
This is bad enough, but we'll let it slide.

It turns out that the query method wants to mutate the collection. (This is obnoxious. When I borrow something I try to leave it in the same condition as when I got it. I don't go mutating it and I expect my libraries to extend the same courtesy.) I guess I don't care all that much — I can simply make the collection mutable.

So why can't I just delete the `Immutable' prefix?

// Now it is named `Lists' with an `s'
import com.foo.bar.Lists;

// Now I have to specify the implementation technique.
// Why isn't it a List.builder?
  final List<Object> intermediateCollection = Lists.newArrayList();

// Why isn't there build that is a no-op?
   query (intermediateCollection)
This is simply wrong.

Thursday, May 6, 2010

Not too many takers on that batch of questions. Oh well.

Anyone care to take a stab at how these notions of equality relate to procedural abstraction?

Wednesday, May 5, 2010

Pondering

What does it mean to say that two programs are ‘the same’?

What does it mean to say that two programs are ‘equivalent’?

What is the difference between intensional and extensional equivalence?

What is ‘observational equivalence’?

Tuesday, May 4, 2010

C++ vs. Lisp

Yes, a provocative title.

In this blog post, Mr. Ebhakt decides to compare C++ and Lisp. (Go ahead and read it, I'll wait...)

Wow! It seems the original article was written by Brandon Corfman. My apologies to Mr. Corfman. Shame on you, Ebhakt.

I had to laugh. First he is disappointed that it took him eight hours to find a solution (Norvig did it in two), and his initial attempt took 220 lines of code (Norvig did it in 45). One source of conciseness was because Lisp function calls return useful values while C++ function calls typically cause side effects. Thus the Lisp expression
(format t "~a:~ {  ~a}~%" num (reverse words))
could make use of the return value from reverse whereas the C++ version
words.reverse(); 
cout << num << ":"; 
for (list::const_iterator i = words.begin(); i != words.end(); ++i) 
    cout << *i; 
cout << "\n";
could not. Mr. Ebhakt also noted that ‘most of the complexity of the STL arises from its explicit use of iterators.’ Henry Baker pointed out in 1992 that iterators are “Signs of Weakness in Object-Oriented Languages”. Armed with this knowledge, Mr. Corfman “decided to rewrite all of the STL’s algorithms to accept containers instead of iterators” and “also make the algorithms (where possible) return copies of containers instead of iterators”. Furthermore, he decided “to build another layer of utility functions on top of the STL to perform common tasks such as tokenizing strings, looking up values in containers, reading from files” etc. The end result? “The new line count is 79 lines.” I wont bother quoting Philip Greenspun at this point. Mr. Corfman sums it all up: ‘C++ isn’t all that bad.’ Of course it doesn't have garbage collection, macros, and closures, but it has ‘extensive library support’ (which he just spent hours throwing away and re-implementing) ‘and of course, its premier status’ which, with an additional two dollars will buy you a large coffee. Mr. Corfman came so close to the mark and just missed it. Lisp isn't magic, it's at a sweet spot in the design space. Lisp doesn't do anything the other languages cannot, but it does what the other ones do not. It works that much harder for the programmer so the programmer can do other things (like program). Sure, a Ford Model-A isn't all that bad. It can get you where you're going and it is unsurpassed in popularity, but if you spend most of your waking hours behind the wheel, wouldn't you rather drive a Dusenberg?
After finding out that this was written by Brandon Corfman, I decided to see if his opinion had changed at all. On his Road to Lisp page, he says “Lisp still has the definite edge on the power curve. I think the clue for me is that I still find myself saying, ‘Why didn't they do this like Lisp?’”

Monday, May 3, 2010

Quasi-cranks

Every now and then I run into ‘quasi-cranks’ in the computer industry. These people don't fit the mold of the standard crank. For instance, they often appear reasonably normal (although in this field, there is more than the usual amount of latitude for ‘normal’), they generally don't rant unless you specifically mention certain trigger words, and they seem to have successful computer careers. But when you start talking about technical details, you realize that these people are profoundly confused.

I often help people debug their code. Most of the time I can follow what they're thinking and how the code is structured. Every now and then, though, I'll come across someone that has some insane hairball of code. They'll point out the location where the error is raised and explain what the intended behavior is. (I don't know why they insist on telling me this. If the code behaved the way it was intended, there wouldn't be an error.) I listen patiently and then ask them what the code actually did, and then try to understand where the reasoning went wrong. It's at this point I get the first inklings of weirdness. I'll find out that what they want to do is fairly simple, but that the error is bizarre. For example, they'll say they want to look something up in a table, but the error message is something like Error: Variable reference to syntactic keyword.

Now this can go one of three ways. The best is when I ask them if they're doing something weird and they sheepishly admit “Yeah, I thought I could model this with a first-class environment... I guess I should use a hash table.” These people have a future in computers. They're exploring.

Second best is when they say “I'm using a first-class environment to hold my lookup table.” and when I suggest some more conventional mechanism they assert “No, I need to be able to call eval.” and it turns out they have some hair-brained idea that kinda sounds plausible, but won't work in practice. These people can make a living in computers, but they need a structured environment.

Then there are the third type of people. When you look at the hairball of code you see a bunch of things that make no sense at all: continuations that are captured, but not bound to a variable, spurious data structure copying, strange and unexpected variable names, checks for errors that cannot logically occur, etc. If you ask them what they are doing, they'll say something like “I have to eval the current continuation on the stack.” or “That forces the oldspace to swap.” If you try to encourage them to try a standard approach they explain “Look, this code pushes a binding on the continuation and I need to find a way to remove it. The error message is because the garbage collector is evaluating the wrong binding. If I could remove the binding here, then the collector will see the correct binding here.”

The first couple of times I ran into people like this I tried to nudge them in the right direction. It doesn't work. People like this are profoundly confused. I'm not sure what they are doing. They think they are programming, but the logic is unrelated to any model of programming I am familiar with. I'll usually feign ignorance and suggest some debugging tool or something. (Sometimes they'll come back with a ‘solution’ to their bug: “Oh yeah, it was easy, I just pushed a second null on the continuation.”)

You can tell these quasi-cranks from the way they use computer jargon and terminology. They “misunderstand or fail to use standard notation and terminology,” and “ignore fine distinctions which are essential to correctly understand mainstream belief.” (as per the Wikipedia article). But it astounds me that some of these people seem to have careers as programmers despite their confusion.

Friday, April 30, 2010

Importance of terminology

An upcoming study of first-year computer science students has an interesting observation about terminology. It seems that their understanding of the computer science vocabulary is much worse than one would expect. You cannot blame the teachers because I'm sure they were unaware of the problem. You cannot blame the students for being `dumb' because nearly all of them could identify a `function body'. Yet fewer than one in three could tell you what is meant by a `defined name' or a `predicate'. Really! The results surprised me, but in thinking about it, I have a couple of personal anecdotes that highlight the problem.

Every year or so there used to be a heated debate on one or more usenet groups (remember usenet?) between the static and dynamic type factions. (I won't argue either side.) One year, however, I encountered someone who challenged me about my viewpoint. He said that he knew many people who held a similar viewpoint to mine, and he had a lot of respect for them. They weren't idiots. But he could not understand their rationale. His viewpoint made perfect logical sense to him, so how could another intelligent person disagree so violently? He didn't want to argue, he wanted to understand how anyone could find the alternative viewpoint to be as logical. (He assumed it must be so, or there wouldn't be so many respectable adherents to it.)

We discussed it back and forth for a while in a non-heated way and it dawned on me that the fundamental problem is simply one of terminology. `Static typers' and `dynamic typers' have completely different understandings of the word `type'. They cannot agree on the simplest things because they each think the other knows what he means by `type'.

For the sake of clarity, I'll lay out the definitions:
  • Type1 — an equivalence class for data. Two objects are of the ‘same type’ if operations on one could be (or could have been) performed on the other with analagous results. For example, ‘integer’ is a type because objects like ‘3’ and ‘42’ can both be added, subtracted, multiplied, and divided.
  • Type2 — a purely syntactic label that can be assigned to a code fragment. There are rules for assigning such labels to primitive fragments and rules for composing types as the code fragments are composed. Certain compositions are disallowed. A well-typed program has no disallowed compositions, and a type-checking program can verify this. By clever construction of the rules, certain useful runtime properties can be proven.
I think you can see that these two definitions have almost nothing to do with each other.

Once I understood this, I stopped using the word ‘type’ and started talking about ‘tags’ — manifest metadata attached to all objects in memory that describe what each object is. ‘Static typers’ understand the utility and benefit such things, they don't understand why anyone would want to forego compile-time ‘type checks’. They don't mean the annoying task of decorating every variable and procedure in the system with a ‘type specifier’, but rather having the compiler notice that your code said to vector-ref the result of a floating point divide and that most likely is not what you intended. Notwithstanding the contrarians who complain that maybe they did want to vector-ref a float, every Lisp hacker I know of appreciates getting early notification of these kind of errors (provided that false positives are very rare and it remains unnecessary to kowtow to the compiler).

Next anecdote later...

Thursday, April 29, 2010

Persistent store: a new record type

The object map allows us to abstract away the physical addresses where our objects are stored, but only if we can remember where we put it! We need another record type and things will get much more interesting.

A commit-record is a special record that we will write to the durable storage. The commit-record will contain (among other things) the physical address of the root of the object map. We'll require a little more functionality from the durable storage layer. When we first use a previously existing persistent store, we need to ask the durable storage layer to find the most recent valid and complete commit record. It can do this by starting at the newest record and reading backwards through earlier records until it finds one.

Since computers are unreliable, we have to take into account the possibility that the computer could crash in the middle of writing a commit record. When looking for the commit record on restart, we ignore any partially written commit records. When we're writing a commit record to the durable storage, we don't return control until we're sure that the record is durably saved and complete. If the computer crashes at any time before that last byte is written, we will recover using the previous valid commit record, and if it crashes at any time after that last byte is written we'll use this one.

There is an extra burder on the durable store to be able to find the most recent commit record, but we can take a different burden off the store. Instead of insisting that every record be durably written at the time we send it to the store, we can allow most of the records to be buffered until it is more convenient to write. We only insist that the buffers be flushed prior to writing the commit record. This makes a huge difference in the performance of the store.

There is a design decision to be made at this point. When we save durable objects, we can either send the object and the object-map records to the durable store immediattely, or we can defer sending them until we want to write the commit record. The eager strategy is more responsive when we are doing a lot of writes because we can start writing records while we're computing. The lazy strategy will reduce the total amount of traffic (every time we modify the object-map, we need to write new object-map records, if newer entries supersede records that haven't even been written, we can elide the write), but when we eventually do write the commit record, there will be a flurry of activity as we force the pending writes. This will cause a noticable pause at commit time.

There is another interesting design decision. In many cases, it is ok to lose a few seconds of work if the computer crashes. You wouldn't want to lose anything if you were logging financial transactions, but if you were checkpointing a text file, you might consider it ok if two or three of the most recently typed letters didn't get saved. When the computer restarted, you'd re-open the text file and find that what you were typing wasn't completely there, but was mostly there. You can re-enter the last couple of keystrokes. If this is acceptable — and it is for some applications and not for others — then it is not necessary for commit records to be immediately written to durable storage so long as they are written in a reasonably timely manner (within a second or two). The freedom to defer writing commit records like this will also improve performance quite a bit.


Exercise: Add commit records to your durable store and the ability to recover the most recent commit record when opening an existing store.

Tuesday, April 27, 2010

Open letter to the SEC


To Whom it may concern:

In the document entitled

    SECURITIES AND EXCHANGE COMMISSION
    17 CFR Parts 200, 229, 230, 232, 239, 240, 243 and 249
    Release Nos. 33-9117; 34-61858; File No. S7-08-10
    RIN 3235-AK37 
    ASSET-BACKED SECURITIES
a number of changes to Item 601 of Regulation S-K, Rules 101, 201, 202 and 305 of Regulation S-T, a new Rule 314 of Regulation S-T and changes to Form 8-K are proposed. These changes are proposed to accommodate the filing of the ‘waterfall’ computer program. The document requests comments on this proposal.



Is it appropriate for us to require most ABS issuers to file the waterfall computer program?

As mentioned in the proposal, ABS issuers are currently required to submit an informal, ‘English language’ description of the waterfall. A computer program would be far more formal and less likely to be ambiguous or misinterpreted.

Is it appropriate to require issuers to submit the waterfall computer program in a single programming language, such as Python, to give investors the benefit of a standardized process? If so, is Python the best choice or are there other open source programming language alternatives (such as PERL) that would be better suited for these purposes?

The proposal goes to great lengths to justify the choice of Python as the language in which the waterfall program should be expressed. Unfortunately, every one of the justifications is either incorrect or irrelevant.
  • Python is an open source interpreted programming language. — Python is more accurately described as a family of programming languages with several implementations. The Python Software Foundation supports one particular implementation, but both Microsoft and Nokia offer alternative implementations. The Python Software Foundation provides an interpreter, but there are versions of Python that are fully compiled, byte-code compiled, or JIT compiled.
  • Open source means that the source code is available to all users.— This is true, but irrelevant. INTERCAL is an open source language, yet it is completely unsuitable for the purposes of writing a waterfall program. On the other hand, the Microsoft implementation of ECMAScript is proprietary, yet ECMAScript is certainly a language to consider for waterfall programs.
  • An interpreted language is a programming language that requires an interpreter in the target computer for program execution. — Interpretation is a technique for implementing a language. Few languages require an interpreter, and as noted above, compilers for Python exist.
  • Since Python is an interpreted language that does not need to be compiled prior to running it, executable code would not need to be published on EDGAR. We define executable code in Rule 11 of Regulation S-T [17 CFR 239.11] as instructions to a computer to carry out operations that use features beyond the viewer's, reader's, or Internet browser's native ability to interpret and display HTML, PDF, and static graphic files. Such code may be in binary (machine language) or in script form. Executable code includes disruptive code.— A Python program is most certainly ‘executable’ by the definition supplied.
The proposal appears to be suggesting that safety is an important desideratum. This includes the safety of the EDGAR system as well as the safety of the investors that would presumably use the waterfall program. The proposal concludes that a language that is executed by an interpreter would be inherently safer than a compiled language. This is false. There has been a considerable amount of research into computer safety, and it is clear that safety is very hard to achieve even if it is the highest priority in the design. Provable safety is not, and never has been, a goal of Python.

To conclude, I believe that a formal description of the waterfall algorithms written as a computer program may be quite beneficial to investors. The proposal, however, fails to enumerate the desiderata for the language such a program would be written in. The proposal makes a few erroneous assertions and then puts forward Python as a suggested language. While Python is a popular language, it was never intended to be used as a critical component of such an important part of the U.S. economy.

I suggest that Securities and Exchange Commission, if they intend to pursue the path of requiring a formal waterfall program to be provided, follow a formal process for finding or developing a suitable language. First, the requirements for such a language need to be specified. Second, these requirements should be compared against existing and proposed languages. Finally, one or more of the languages should be adopted and implemented.

Sincerely,
Joe Marshall