Thursday, July 7, 2022

Series tips and tricks

I'm an aficionado of the Common Lisp series package. Here are a couple of tricks and tips.

The series package works by walking your code at macro expansion time. It is able to perform better optimizations if it can walk the Common Lisp forms DEFUN, FUNCALL, LET, LET*, and MULTIPLE-VALUE-BIND. This is easily done in a defpackage

(defpackage "MY-PACKAGE"
  (:shadowing-import-from "SERIES"
     "DEFUN" "FUNCALL" "LET" "LET*" "MULTIPLE-VALUE-BIND")
  (:use "COMMON-LISP" "SERIES"))

series will still mostly work if you don't shadowing-import these symbols, but it will miss some important optimizations. I mention this tip because I've seen people have problems by omitting this import.

series was designed to handle basic linear iteration, and it's quite good at that, but you'll really hit a wall if you try anything fancier. But I had two good cases where I wanted to push the limits. One is tree traversal. I wanted the series of nodes in a tree. This is easily done recursively, but series is going to generate an iterative loop.

We can eliminate recursion by making a state machine with a stack. We'll make a state machine that can walk the tree and create a series out of all the states it occupies as it walks. series provides us with a primitive just for this purpose:

  (scan-fn type initial-state-thunk
                state-transition-function
                final-state-p)

To traverse a tree, the initial state is just our stack. The state transition function pops the stack and pushes the children of the popped node (if any). We're done when the stack is empty.

(defun tree-walker-states (root-node node-children)
  (declare (optimizable-series-function))
  (scan-fn 'list (lambda () (list root-node))
                 (lambda (stack)
                   (append (funcall node-children (car stack)) (cdr stack)))
                 #'null))

The node being visited at each step is the one at the top of the stack, so scan-tree is simply this:

(defun scan-tree (root-node node-children)    
  (declare (optimizable-series-function))
  (map-fn 't #'car (tree-walker-states root-node node-children)))

Let's try it.

(defun pathname-children (pathname)
  (and (uiop:directory-pathname-p pathname)
       (directory (make-pathname :name :wild
                                 :type :wild
                                 :defaults pathname))))

(defun scan-directory (pathname)
  (declare (optimizable-series-function))
  (scan-tree pathname #'pathname-children))

> (collect-nth 240 (scan-directory "~/.m2/"))
#P"/home/jrm/.m2/repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.9.7/jackson-module-parameter-names-2.9.7.pom"

You can't tell from the output, but series pipelined this form. It didn't create an in-memory data structure with hundreds of thousands of pathnames just to select the 240th. It only walked the tree as far as it needed to find the 240th element.

The second case where I want to push the limits is in processing REST API calls. Some API calls return lists of items and it's nice to return them as a series. But API calls that return lists usually want to paginate the results, so we need a way to iterate over the pages while iterating over the items within the pages.

A trick similar to the previous one works. We keep a queue of items to yield and at each state transition we dequeue one item. At the end of the queue is a special item that indicates that we need to fetch the next page.

Here's how we put this together. We use scan-fn to drive the loop, we start by pushing the marker to fetch the first page, and we're done when the queue is empty:

(scan-fn 't (lambda () (list (cons :page 1)))
                 ...
                 #'null)

The state transition function examines the head of the queue. If it isn't a special marker, we just continue with the tail of the queue:

(lambda (queue)
  (let ((head (car queue)))
    (if (and (consp head)
             (eq (car head) :page))
        ...
        (cdr queue))))

If it is a :page marker, we fetch a page of elements:

(let* ((page-number (cdr head))
       (paged-url   (format nil "~a~[~;~:;?page=~:*~d~]" base-url page-number)))
   (multiple-value-bind (stream code reply-headers)
       (dex:get paged-url
                :headers request-headers
                :want-stream t)
     ...))

And we construct the new queue. The elements are decoded from the stream, but if we have more pages, we append the next page marker to the list of elements:

(append (json:decode-json stream)
        (let ((link (gethash "link" reply-headers)))
          (when (and link (search "rel=\"next\"" link))
            (list (cons :page (+ page-number 1))))))

So our scan-fn call will create the series of state-machine states. We map-fn car over the states to get a series of page elements and the occasional next-page page marker. Finally, we use choose-if to discard the page markers.

(defun scan-paged-api (base-url request-headers)
  (declare (optimizable-series-function))
  (choose-if
   (lambda (item)
     (not (and (consp item)
               (eq (car item) :page))))
   (map-fn t #'car
     (scan-fn 'list
       (lambda () (list (cons :page 1)))
       (lambda (stack)
         (let ((top (car stack)))
           (if (and (consp top)
                    (eq (car top) :page))
               (let* ((page-number (cdr top))
                      (paged-url (format nil "~a~[~;~:;?page=~:*~d~]" base-url page-number)))
                 (multiple-value-bind (stream code reply-headers)
                     (dex:get paged-url
                              :headers request-headers
                              :want-stream t)
                   (assert (= code 200))
                   (append (json:decode-json stream)
                           (let ((link (gethash "link" reply-headers)))
                             (when (and link (search "rel=\"next\"" link))
                               (list (cons :page (+ page-number 1))))))))
               (cdr stack))))
      #'null))))

That's pretty complex. What's the payoff? Later code becomes much, much simpler. Here's an example. I have a variation on the above code that is specialized for GitHub. I can, for example, write a loop that iterates through the repos in an org, filters them by name, and collects the matching ones into a list like this:

(collect 'list
  (choose-if (lambda (repo)
               (cl-ppcre:scan regex (get-name repo)))
             (scan-org-repos org)))

Now that's pretty simple and hard to get wrong.

2 comments:

  1. Is there a typo on this line in the first code example?

    (:shawoding-import-from "SERIES"

    Should it be "shadowing", not "shawoding"?

    ReplyDelete