Saturday, September 6, 2025

Paginated collections

If you like the series abstraction, then you will want collections to have functions that return series of their elements. The code that uses series will be able to treat the collection functionality while the generated code will be iterative. If you are dealing with web resources, it is likely that your collection will be paginated. You don't want to fetch all the pages at the beginning and then process the elements one at a time, you want to interleave the page fetches with the element processing, so that you process several elements between each page fetch.

This would be trivial to implement as a flatmap — you'd iterate over the pages, then for each page you'd iterate over the collection elements on that page — but the series abstraction isn't built to do a nested iteration like that. Instead, you need to build a series out of a state machine.

The state will be a queue of pending elements along with the token for the next page. At each step, the state machine will pop the queue. If the queue is empty, we fetch the next page and refill the queue. The inner series generator is the series of states of this state machine. It makes transitions like this:

  (<item1> <item2> … <next-page-token>)
  (<item2> … <next-page-token>)
  …
  (<next-page-token>)
  (<itemN+1> <itemN+2> … <next-page-token+1>)
  (<itemN+2> … <next-page-token+1>)
  …

This is implemented via the general series scan-fn that takes an initial state, a state transition function, and returns a series of states:

(scan-fn
         (lambda () (get-elements (fetch-first-page)))
         (lambda (queue)
           (if (next-page-token? (car queue))
               (get-elements (fetch-next-page (car queue)))
               (cdr queue)))
         (lambda (queue) (null queue)))

Over this series of states, we map a function that returns the head of the queue:

(map-fn 't #'car <series-of-states>)
  item1
  item2
  …
  next-page-token
  itemN+1
  itemN+2
  …
  next-page-token+1
  …

Finally, we filter out the page tokens:

(choose-if (lambda (x) (not (next-page-token? x)))
          <series-of-heads>)
  item1
  item2
  …
  itemN+1
  itemN+2
  …

It is a bit convoluted, but it gives you the elements of the series one at a time, fetching the pages as needed when the current page is emptied. We're not fetching the entire collection at the start, although we are buffering a page of elements at any time.

It won't bother Lisp programmers, but it may bother people who are anal about static types that the intermediate series is heterogeneous — it contains the elements of the collection as well as the next page tokens. If it really bothers you, you can rework the state machine to look ahead for the next page marker so that there is always at least one element in the queue. There is ultimately little difference: in one case, we filter out the page markers, in the other case, we look ahead and avoid putting them in. It is a matter of taste whether the complication of looking ahead is worth avoiding the type filter.

No comments: