I'm an aficionado of the Common Lisp series
package. Here are a couple of tricks and tips.
The series
package works by walking your code at macro
expansion time. It is able to perform better optimizations if it
can walk the Common Lisp
forms DEFUN
, FUNCALL
, LET
, LET*
,
and MULTIPLE-VALUE-BIND
. This is easily
done in a defpackage
(defpackage "MY-PACKAGE" (:shadowing-import-from "SERIES" "DEFUN" "FUNCALL" "LET" "LET*" "MULTIPLE-VALUE-BIND") (:use "COMMON-LISP" "SERIES"))
series
will still mostly work if you don't
shadowing-import these symbols, but it will miss some important
optimizations. I mention this tip because I've seen people have
problems by omitting this import.
series
was designed to handle basic linear iteration,
and it's quite good at that, but you'll really hit a wall if you try
anything fancier. But I had two good cases where I wanted to push
the limits. One is tree traversal. I wanted the series of nodes in
a tree. This is easily done recursively, but series
is
going to generate an iterative loop.
We can eliminate recursion by making a state machine with a
stack. We'll make a state machine that can walk the tree and create
a series out of all the states it occupies as it
walks. series
provides us with a primitive
just for this purpose:
(scan-fn type initial-state-thunk state-transition-function final-state-p)
To traverse a tree, the initial state is just our stack. The state transition function pops the stack and pushes the children of the popped node (if any). We're done when the stack is empty.
(defun tree-walker-states (root-node node-children) (declare (optimizable-series-function)) (scan-fn 'list (lambda () (list root-node)) (lambda (stack) (append (funcall node-children (car stack)) (cdr stack))) #'null))
The node being visited at each step is the one at the top of the
stack, so scan-tree
is simply this:
(defun scan-tree (root-node node-children) (declare (optimizable-series-function)) (map-fn 't #'car (tree-walker-states root-node node-children)))
Let's try it.
(defun pathname-children (pathname) (and (uiop:directory-pathname-p pathname) (directory (make-pathname :name :wild :type :wild :defaults pathname)))) (defun scan-directory (pathname) (declare (optimizable-series-function)) (scan-tree pathname #'pathname-children)) > (collect-nth 240 (scan-directory "~/.m2/")) #P"/home/jrm/.m2/repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.9.7/jackson-module-parameter-names-2.9.7.pom"
You can't tell from the output, but series
pipelined this form. It didn't create an in-memory data structure
with hundreds of thousands of pathnames just to select the 240th.
It only walked the tree as far as it needed to find the 240th
element.
The second case where I want to push the limits is in processing REST API calls. Some API calls return lists of items and it's nice to return them as a series. But API calls that return lists usually want to paginate the results, so we need a way to iterate over the pages while iterating over the items within the pages.
A trick similar to the previous one works. We keep a queue of items to yield and at each state transition we dequeue one item. At the end of the queue is a special item that indicates that we need to fetch the next page.
Here's how we put this together. We use scan-fn
to
drive the loop, we start by pushing the marker to fetch the first
page, and we're done when the queue is empty:
(scan-fn 't (lambda () (list (cons :page 1))) ... #'null)
The state transition function examines the head of the queue. If it isn't a special marker, we just continue with the tail of the queue:
(lambda (queue) (let ((head (car queue))) (if (and (consp head) (eq (car head) :page)) ... (cdr queue))))
If it is a :page
marker, we fetch a page of elements:
(let* ((page-number (cdr head)) (paged-url (format nil "~a~[~;~:;?page=~:*~d~]" base-url page-number))) (multiple-value-bind (stream code reply-headers) (dex:get paged-url :headers request-headers :want-stream t) ...))
And we construct the new queue. The elements are decoded from the stream, but if we have more pages, we append the next page marker to the list of elements:
(append (json:decode-json stream) (let ((link (gethash "link" reply-headers))) (when (and link (search "rel=\"next\"" link)) (list (cons :page (+ page-number 1))))))
So our scan-fn
call will create the series of
state-machine states. We map-fn
car
over
the states to get a series of page elements and the occasional
next-page page marker. Finally, we use choose-if
to
discard the page markers.
(defun scan-paged-api (base-url request-headers) (declare (optimizable-series-function)) (choose-if (lambda (item) (not (and (consp item) (eq (car item) :page)))) (map-fn t #'car (scan-fn 'list (lambda () (list (cons :page 1))) (lambda (stack) (let ((top (car stack))) (if (and (consp top) (eq (car top) :page)) (let* ((page-number (cdr top)) (paged-url (format nil "~a~[~;~:;?page=~:*~d~]" base-url page-number))) (multiple-value-bind (stream code reply-headers) (dex:get paged-url :headers request-headers :want-stream t) (assert (= code 200)) (append (json:decode-json stream) (let ((link (gethash "link" reply-headers))) (when (and link (search "rel=\"next\"" link)) (list (cons :page (+ page-number 1)))))))) (cdr stack)))) #'null))))
That's pretty complex. What's the payoff? Later code becomes much, much simpler. Here's an example. I have a variation on the above code that is specialized for GitHub. I can, for example, write a loop that iterates through the repos in an org, filters them by name, and collects the matching ones into a list like this:
(collect 'list (choose-if (lambda (repo) (cl-ppcre:scan regex (get-name repo))) (scan-org-repos org)))
Now that's pretty simple and hard to get wrong.
2 comments:
Is there a typo on this line in the first code example?
(:shawoding-import-from "SERIES"
Should it be "shadowing", not "shawoding"?
Err, yeah. Fixed. Thanks!
Post a Comment