Friday, October 18, 2013

Unexciting code

Source code control isn't supposed to be interesting or exciting for the user. Blogging about source code control isn't going to be interesting or surprising for the reader, either. I'll spare you the walkthrough. You're old enough to figure it out without me holding your hand. It'll be more interesting if I blog about the problems and bugs we encountered.

Conman was the configuration manager built on the core ChangeSafe code. The master catalog in conman holds the highest level objects being managed.
(defclass master-catalog ()
  ;; subsystems and products (PC's) in this repository
  ((products   :initform nil
               :version-technique :composite-set
               :accessor master-catalog/products)

   (subsystems :initform nil
               :version-technique :composite-set
               :accessor master-catalog/subsystems)

   ;; All satellite repositories known to this master.  We don't allow
   ;; satellite repository names to change, though the projects they
   ;; contain may be renamed.  **WARNING** BE CAREFUL IF YOU OPERATE
   ;; in non-latest metaversion views of this list or you might try to
   ;; create a satellite which already exists.  Only update this list
   ;; using the latest metaversion.
   (satellite-repositories :initform nil :version-technique :composite-set)

   ;; Projects (a.k.a. ChangeSafe "classes") contained in satellite
   ;; repositories.  The descriptors contain the key mapping from a
   ;; project name to a satellite-repository-name.  We could almost
   ;; make this just a (project-name . satellite-name) mapping, but we
   ;; need to version project-name, and we also want to cache in the
   ;; master some information in the satellites so that we don't
   ;; always have to examine the satellites for often accessed
   ;; information.
   (classes :accessor master-catalog/classes
            :initform nil
            :version-technique :composite-set)

   ;; Cset-relationship-tuples is conceptually a list of sublists,
   ;; where each sublist is a tuple.  For every master cid which
   ;; results in the creation of satellite cids, a tuple is added
   ;; which enumerates the master cid and the satellite cids which it
   ;; caused to be created.  e.g. '((master.cid.1 satellite-1.cid.1
   ;; satellite-2.cid.1)) Because we want portable references, blah
   ;; blah blah, we actually reference DIDS of CHANGE-SET objects
   ;; rather than the cids.  We may instead wish to store CID-OBJECT
   ;; references.  TBD.

   ;; Right now, this information is maintained only for change
   ;; transactions which arise from WITH-CM-MASTER-TXN and
   ;; WITH-CM-SATELLITE-TXN.  This is ok, since those are the
   ;; interesting txns which manipulate satellite databases.

   ;; Note that because of the high volume of csets we expect to
   ;; track, we actually represent this information as a vector of
   ;; vectors to achieve space compaction.
   (cset-relationship-tuples :initform (make-instance 'persistent-vector
                                                           :initial-element nil
                                                           :size 1)
                             :version-technique :nonversioned)

   (cset-rel-tuples-index :initform (make-instance 'persistent-vector
                                                        :initial-element -1
                                                        :size 1)
                          :version-technique :nonversioned)

   ;; BOTH these slots are updated ONLY by vm-txn-note-change-set,
   ;; except for schema upgrading.

   ;; The cset-rel-tuples-index slot is a conceptual hash table into the
   ;; cset-relationship-tuples slot. This is used by
   ;; master-catalog-lookup-cset-relationship
   ;; to avoid an extremely costly linear search of cset-relationship-tuples.
   ;; This is very important for cset_add, cset_remove, and csets_from_file.

   ;; The idea is that the did-string of the master-cid's did is hashed.
   ;; Reducing that hash modulo the number of entries in cset-rel-tuples-index,
   ;; finds a "home" index of cset-rel-tuples-index. Using the sb32 value
   ;; in that element, we either have a -1 (so the entry is not in the
   ;; hash table) or we get an index into cset_relationship_tuples.
   ;; If there is no hash collision, that element of cset_relationship_tuples
   ;; will contain the desired master-cid did we are looking for. If it
   ;; isn't the one we want, we have had a hash collision, and we resolve it
   ;; by linear probing in the next-door (circularly) element of
   ;; cset-rel-tuples-index.
   ;; The number of elements of cset-rel-tuples-index is always a prime number,
   ;; and is also maintained to be more than twice as large as the number of
   ;; entries in cset-relationship-tuples. That is important, to prevent
   ;; clustering and slow searching. So when it grows, cset-rel-tuples-index
   ;; grows by a reasonable factor (about 2) so that it always contains
   ;; at least half "holes", that is, -1.  Further, we want to avoid frequent
   ;; growth, because growing requires computing every entry in the hash table
   ;; again. That makes for a big transaction, as every element of the
   ;; cid-relationship-tuple vector has to be mapped in, and rehashed with
   ;; the new size of cset-rel-tuples-index.
   ;; Space considerations: In Jack's db, there are roughly 40,000 elements
   ;; currently in the cset-relationship-tuples.  Suppose we had 100,000
   ;; elements. In Jack's db, it appears that the tuples are about 2 elements
   ;; each, average. Suppose it were 9. Then the tuples would take 4*(1+9)=40
   ;; bytes each, so 40*100,000 = 4Mb total (plus another 400,000 for the
   ;; cset-relationship-tuples vector itself).  This is large, but not likely
   ;; to be a cause of breakage anytime soon.

   ;; The cset-name-hashtable maps cset names to csets.  While the HP
   ;; model of ChangeSafe doesn't allow changing the name of a cset,
   ;; we allow this in general.  So this hash table is keyed by cset
   ;; name, and valued by all csets which EVER bore that name in their
   ;; versioned name component.  The hash value is therefore a list.
   ;; In the case of system augmented names (by
   ;; change_create/master_change), there shouldn't be any collisions.
   ;; We also use this slot to hash unaugmented user names to csets,
   ;; and those are far more likely to have collisions (one key ->
   ;; multiple csets).  In the case of un-augmented names, this is
   ;; expected. In the case of augmented names, this is an error.
   (cset-name-hashtable :version-technique nil
                        :initform (make-instance 'persistent-hash-table :size 1023)
                        :reader master-catalog/cset-name-hashtable)
  (:documentation "Catalog/hierarchy-root of versioned information maintained in the master repository.")
  (:metaclass versioned-standard-class)
  (:schema-version 0))
Pretty straightforward, no? No? Let's list the products in the master catalog:
(defun cmctl/list-products (conman-request)
  "cheesy hack to list the products"
  (let ((master-repository-name (conman-request/repository-dbpath conman-request))
        (reason "list products")
        (userid (conman-request/user-name conman-request)))
     :master-metaversion :latest-metaversion
     :version :latest-version
     :reason reason
     :transaction-type :read-only
     :receiver (lambda (master-repository master-transaction master-catalog)
                 (declare (ignore master-repository master-transaction))
                 (collect 'list
                          (map-fn 't (lambda (product)
                                       (list (distributed-object-identifier product)
                                             (named-object/name product)
                                             (described-object/description product)))
                                  (master-catalog/scan-products master-catalog)))))))
That isn't very edifying.

Start from the end:
(master-catalog/scan-products master-catalog)
defined as
(defun master-catalog/scan-products (master-catalog)
  (declare (optimizable-series-function))
  (versioned-object/scan-composite-versioned-slot master-catalog 'products))
The optimizable-series-function declaration indicates that we are using Richard Waters's series package. This allows us to write functions that can be assembled into an efficient pipeline for iterating over a collection of objects. This code:
(collect 'list
   (map-fn 't (lambda (product)
                (list (distributed-object-identifier product)
                      (named-object/name product)
                      (described-object/description product)))
           (master-catalog/scan-products master-catalog)))
takes each product in turn, creates a three element list of the identifier, the project name, and the product description, and finally collects the three-tuples in a list to be returned to the caller. Here is what it looks like macroexpanded:
    (SETQ #:OUT-914 'PRODUCTS)
    (COMMON-LISP:LET (#:OUT-913 #:OUT-912)
      (SETQ #:OUT-913 (SLOT-VALUE-UNVERSIONED #:OUT-917 #:OUT-914))
      (SETQ #:OUT-912
                   (DEBUG-MESSAGE 4 "Using override cid-set to scan slot ~s"
      (COMMON-LISP:LET (#:OUT-911 #:OUT-910 #:OUT-909)
        (MULTIPLE-VALUE-SETQ (#:OUT-911 #:OUT-910 #:OUT-909)
          (CVI-GET-ION-VECTOR-AND-INDEX #:OUT-913 #:OUT-912))
        (IF #:OUT-911
             (IF (COMMON-LISP:LET ((#:G717-902 #:OUT-910))
                   (IF #:G717-902
                       (THE T (CVI/DEFAULT-ALLOWED #:OUT-913))))
                  (SLOT-UNBOUND (CLASS-OF #:OUT-917) #:OUT-917 #:OUT-914)))))
        (DEBUG-MESSAGE 5 "Active ion vector for retrieval:  ~s" #:OUT-911)
        (COMMON-LISP:LET (#:OUT-908)
          (SETQ #:OUT-908
                  (IF #:OUT-911
                      (THE T #())))
          (COMMON-LISP:LET (#:ELEMENTS-907
                            (#:LIMIT-905 (ARRAY-TOTAL-SIZE #:OUT-908))
                            (#:INDEX-904 -1)
                            (#:INDEX-903 (- -1 1))
                            (#:LASTCONS-897 (LIST NIL))
                     (TYPE SERIES::-VECTOR-INDEX+ #:INDEX-904)
                     (TYPE INTEGER #:INDEX-903)
                     (TYPE CONS #:LASTCONS-897)
                     (TYPE LIST #:LST-898))
            (SETQ #:LST-898 #:LASTCONS-897)
              (INCF #:INDEX-904)
               (IF (= #:INDEX-904 #:LIMIT-905)
                   (GO SERIES::END))
               (SETQ #:ELEMENTS-907
                       (ROW-MAJOR-AREF #:OUT-908
                                       (THE SERIES::VECTOR-INDEX
              (INCF #:INDEX-903)
              (IF (MINUSP #:INDEX-903)
                  (GO #:LL-918))
              (SETQ #:ITEMS-915
                      ((LAMBDA (ION-SOUGHT)
                          (SVREF #:OUT-909 ION-SOUGHT) ION-SOUGHT))
              (SETQ #:ITEMS-900
                      ((LAMBDA (PRODUCT)
                               (NAMED-OBJECT/NAME PRODUCT)
                               (DESCRIBED-OBJECT/DESCRIPTION PRODUCT)))
              (SETQ #:LASTCONS-897
                      (SETF (CDR #:LASTCONS-897) (CONS #:ITEMS-900 NIL)))
              (GO #:LL-918)
            (CDR #:LST-898)))))))
To be continued...

Sunday, October 13, 2013

Next step

We build a simple project/branch/version hierarchical abstraction, and we implement it on top of the core layer.

  • a project is a collection of branches that represent alternative ways the state evolves. Every project has a main branch.
  • A branch is a collection of versions that represent the evolution of the branch state over time.
  • A version is a collection of change-sets with some trappings.
  • A change-set is our unit of change.
When we do a read/write transaction, we'll add a new change set to the repository. Read-only transactions will specify a version for reading the core objects. Or two versions for generating diffs. Or maybe even three versions for a three-way merge.

Under this development model, developers will sync their workspace with the most chronologically recent version of the main branch. Their new change sets will be visible only to them, unless (until, we hope) they are "promoted" into the development branch.

We wanted to encourage frequent incremental check-ins. We wanted hook this up to Emacs autosave. Frequent check-ins of much smaller diffs breaks even.

Once you get to the point where your code passes all the tests, you promote it into the branch so everyone can use it. Every now and then the admins will "fork a relase", do some "cherrypicks" and make a release branch in the project.

Once you get to the point where your code passes all the tests, you promote it into the branch so everyone can use it. Every now and then the admins will "fork a relase", do some "cherrypicks" and make a release branch in the project.

The core code does all the heavy lifting, this is window dressing. The style: minimalist. Skin the change model.

This is turning into the code equivalent of a power-point lecture. You can read it yourself, I'm not going to walk you through it.

So that's it? Is that all there is? Many source code or change management systems do something more or less similar to this with files and directory trees instead of CLOS objects. Cute hack, but why all the fuss? Hasn't this been done?

If it seems obvious to you how we'd implement some of the usual source code control operations, good. We don't have to train you.

I wont insult your intelligence explaining in detail how to do a rollback by removing a change-set from a version. Use SETF, of course.

I hope nobody notices the 300 lb chicken sitting next to that shady-looking egg in the corner.

And something for philosopher/implementors to worry about: If I demote the change-set that instantiates an object, where does the object go? Is it possible create a reference to an uninstantiated object? What happens if you try to follow it?