Friday, October 18, 2013

Unexciting code

Source code control isn't supposed to be interesting or exciting for the user. Blogging about source code control isn't going to be interesting or surprising for the reader, either. I'll spare you the walkthrough. You're old enough to figure it out without me holding your hand. It'll be more interesting if I blog about the problems and bugs we encountered.

Conman was the configuration manager built on the core ChangeSafe code. The master catalog in conman holds the highest level objects being managed.
(defclass master-catalog ()
  ;; subsystems and products (PC's) in this repository
  ((products   :initform nil
               :version-technique :composite-set
               :accessor master-catalog/products)

   (subsystems :initform nil
               :version-technique :composite-set
               :accessor master-catalog/subsystems)

   ;; All satellite repositories known to this master.  We don't allow
   ;; satellite repository names to change, though the projects they
   ;; contain may be renamed.  **WARNING** BE CAREFUL IF YOU OPERATE
   ;; in non-latest metaversion views of this list or you might try to
   ;; create a satellite which already exists.  Only update this list
   ;; using the latest metaversion.
   (satellite-repositories :initform nil :version-technique :composite-set)

   ;; Projects (a.k.a. ChangeSafe "classes") contained in satellite
   ;; repositories.  The descriptors contain the key mapping from a
   ;; project name to a satellite-repository-name.  We could almost
   ;; make this just a (project-name . satellite-name) mapping, but we
   ;; need to version project-name, and we also want to cache in the
   ;; master some information in the satellites so that we don't
   ;; always have to examine the satellites for often accessed
   ;; information.
   (classes :accessor master-catalog/classes
            :initform nil
            :version-technique :composite-set)

   ;; Cset-relationship-tuples is conceptually a list of sublists,
   ;; where each sublist is a tuple.  For every master cid which
   ;; results in the creation of satellite cids, a tuple is added
   ;; which enumerates the master cid and the satellite cids which it
   ;; caused to be created.  e.g. '((master.cid.1 satellite-1.cid.1
   ;; satellite-2.cid.1)) Because we want portable references, blah
   ;; blah blah, we actually reference DIDS of CHANGE-SET objects
   ;; rather than the cids.  We may instead wish to store CID-OBJECT
   ;; references.  TBD.

   ;; Right now, this information is maintained only for change
   ;; transactions which arise from WITH-CM-MASTER-TXN and
   ;; WITH-CM-SATELLITE-TXN.  This is ok, since those are the
   ;; interesting txns which manipulate satellite databases.

   ;; Note that because of the high volume of csets we expect to
   ;; track, we actually represent this information as a vector of
   ;; vectors to achieve space compaction.
   (cset-relationship-tuples :initform (make-instance 'persistent-vector
                                                           :initial-element nil
                                                           :size 1)
                             :version-technique :nonversioned)

   (cset-rel-tuples-index :initform (make-instance 'persistent-vector
                                                        :initial-element -1
                                                        :size 1)
                          :version-technique :nonversioned)

   ;; BOTH these slots are updated ONLY by vm-txn-note-change-set,
   ;; except for schema upgrading.

   ;; The cset-rel-tuples-index slot is a conceptual hash table into the
   ;; cset-relationship-tuples slot. This is used by
   ;; master-catalog-lookup-cset-relationship
   ;; to avoid an extremely costly linear search of cset-relationship-tuples.
   ;; This is very important for cset_add, cset_remove, and csets_from_file.

   ;; The idea is that the did-string of the master-cid's did is hashed.
   ;; Reducing that hash modulo the number of entries in cset-rel-tuples-index,
   ;; finds a "home" index of cset-rel-tuples-index. Using the sb32 value
   ;; in that element, we either have a -1 (so the entry is not in the
   ;; hash table) or we get an index into cset_relationship_tuples.
   ;; If there is no hash collision, that element of cset_relationship_tuples
   ;; will contain the desired master-cid did we are looking for. If it
   ;; isn't the one we want, we have had a hash collision, and we resolve it
   ;; by linear probing in the next-door (circularly) element of
   ;; cset-rel-tuples-index.
   ;; The number of elements of cset-rel-tuples-index is always a prime number,
   ;; and is also maintained to be more than twice as large as the number of
   ;; entries in cset-relationship-tuples. That is important, to prevent
   ;; clustering and slow searching. So when it grows, cset-rel-tuples-index
   ;; grows by a reasonable factor (about 2) so that it always contains
   ;; at least half "holes", that is, -1.  Further, we want to avoid frequent
   ;; growth, because growing requires computing every entry in the hash table
   ;; again. That makes for a big transaction, as every element of the
   ;; cid-relationship-tuple vector has to be mapped in, and rehashed with
   ;; the new size of cset-rel-tuples-index.
   ;; Space considerations: In Jack's db, there are roughly 40,000 elements
   ;; currently in the cset-relationship-tuples.  Suppose we had 100,000
   ;; elements. In Jack's db, it appears that the tuples are about 2 elements
   ;; each, average. Suppose it were 9. Then the tuples would take 4*(1+9)=40
   ;; bytes each, so 40*100,000 = 4Mb total (plus another 400,000 for the
   ;; cset-relationship-tuples vector itself).  This is large, but not likely
   ;; to be a cause of breakage anytime soon.

   ;; The cset-name-hashtable maps cset names to csets.  While the HP
   ;; model of ChangeSafe doesn't allow changing the name of a cset,
   ;; we allow this in general.  So this hash table is keyed by cset
   ;; name, and valued by all csets which EVER bore that name in their
   ;; versioned name component.  The hash value is therefore a list.
   ;; In the case of system augmented names (by
   ;; change_create/master_change), there shouldn't be any collisions.
   ;; We also use this slot to hash unaugmented user names to csets,
   ;; and those are far more likely to have collisions (one key ->
   ;; multiple csets).  In the case of un-augmented names, this is
   ;; expected. In the case of augmented names, this is an error.
   (cset-name-hashtable :version-technique nil
                        :initform (make-instance 'persistent-hash-table :size 1023)
                        :reader master-catalog/cset-name-hashtable)
  (:documentation "Catalog/hierarchy-root of versioned information maintained in the master repository.")
  (:metaclass versioned-standard-class)
  (:schema-version 0))
Pretty straightforward, no? No? Let's list the products in the master catalog:
(defun cmctl/list-products (conman-request)
  "cheesy hack to list the products"
  (let ((master-repository-name (conman-request/repository-dbpath conman-request))
        (reason "list products")
        (userid (conman-request/user-name conman-request)))
     :master-metaversion :latest-metaversion
     :version :latest-version
     :reason reason
     :transaction-type :read-only
     :receiver (lambda (master-repository master-transaction master-catalog)
                 (declare (ignore master-repository master-transaction))
                 (collect 'list
                          (map-fn 't (lambda (product)
                                       (list (distributed-object-identifier product)
                                             (named-object/name product)
                                             (described-object/description product)))
                                  (master-catalog/scan-products master-catalog)))))))
That isn't very edifying.

Start from the end:
(master-catalog/scan-products master-catalog)
defined as
(defun master-catalog/scan-products (master-catalog)
  (declare (optimizable-series-function))
  (versioned-object/scan-composite-versioned-slot master-catalog 'products))
The optimizable-series-function declaration indicates that we are using Richard Waters's series package. This allows us to write functions that can be assembled into an efficient pipeline for iterating over a collection of objects. This code:
(collect 'list
   (map-fn 't (lambda (product)
                (list (distributed-object-identifier product)
                      (named-object/name product)
                      (described-object/description product)))
           (master-catalog/scan-products master-catalog)))
takes each product in turn, creates a three element list of the identifier, the project name, and the product description, and finally collects the three-tuples in a list to be returned to the caller. Here is what it looks like macroexpanded:
    (SETQ #:OUT-914 'PRODUCTS)
    (COMMON-LISP:LET (#:OUT-913 #:OUT-912)
      (SETQ #:OUT-913 (SLOT-VALUE-UNVERSIONED #:OUT-917 #:OUT-914))
      (SETQ #:OUT-912
                   (DEBUG-MESSAGE 4 "Using override cid-set to scan slot ~s"
      (COMMON-LISP:LET (#:OUT-911 #:OUT-910 #:OUT-909)
        (MULTIPLE-VALUE-SETQ (#:OUT-911 #:OUT-910 #:OUT-909)
          (CVI-GET-ION-VECTOR-AND-INDEX #:OUT-913 #:OUT-912))
        (IF #:OUT-911
             (IF (COMMON-LISP:LET ((#:G717-902 #:OUT-910))
                   (IF #:G717-902
                       (THE T (CVI/DEFAULT-ALLOWED #:OUT-913))))
                  (SLOT-UNBOUND (CLASS-OF #:OUT-917) #:OUT-917 #:OUT-914)))))
        (DEBUG-MESSAGE 5 "Active ion vector for retrieval:  ~s" #:OUT-911)
        (COMMON-LISP:LET (#:OUT-908)
          (SETQ #:OUT-908
                  (IF #:OUT-911
                      (THE T #())))
          (COMMON-LISP:LET (#:ELEMENTS-907
                            (#:LIMIT-905 (ARRAY-TOTAL-SIZE #:OUT-908))
                            (#:INDEX-904 -1)
                            (#:INDEX-903 (- -1 1))
                            (#:LASTCONS-897 (LIST NIL))
                     (TYPE SERIES::-VECTOR-INDEX+ #:INDEX-904)
                     (TYPE INTEGER #:INDEX-903)
                     (TYPE CONS #:LASTCONS-897)
                     (TYPE LIST #:LST-898))
            (SETQ #:LST-898 #:LASTCONS-897)
              (INCF #:INDEX-904)
               (IF (= #:INDEX-904 #:LIMIT-905)
                   (GO SERIES::END))
               (SETQ #:ELEMENTS-907
                       (ROW-MAJOR-AREF #:OUT-908
                                       (THE SERIES::VECTOR-INDEX
              (INCF #:INDEX-903)
              (IF (MINUSP #:INDEX-903)
                  (GO #:LL-918))
              (SETQ #:ITEMS-915
                      ((LAMBDA (ION-SOUGHT)
                          (SVREF #:OUT-909 ION-SOUGHT) ION-SOUGHT))
              (SETQ #:ITEMS-900
                      ((LAMBDA (PRODUCT)
                               (NAMED-OBJECT/NAME PRODUCT)
                               (DESCRIBED-OBJECT/DESCRIPTION PRODUCT)))
              (SETQ #:LASTCONS-897
                      (SETF (CDR #:LASTCONS-897) (CONS #:ITEMS-900 NIL)))
              (GO #:LL-918)
            (CDR #:LST-898)))))))
To be continued...