Abstract Heresies: July 2022

Saturday, July 30, 2022

Let's Play Wordle

Wordle is popular these days. Let's teach the computer how to play.

As usual, I'm using the series library. Also, if you are coding along, there's a function or two I omitted that you'll have to write yourself. Note that I use a named-let, too.

To play Wordle, you try to deduce the secret word by making guesses. After each guess, you are told which letters you got exactly right and which are right, but in the wrong position. Each guess narrows down the number of possible answers until there is one left. It's a simple matter of making good guesses.

Here I define a word as a simple string of five characters and a predicate for testing that the word is all lowercase, alphabetic, ascii characters.

(deftype word () `(simple-string 5))

(defun valid-word? (thing)
  (and (typep thing 'word)
       (str:downcase? thing)
       (every #'alpha-char-p thing)
       (every #'str:ascii-char-p thing)))

I don't use a satisfies clause in the word type. satisfies can cause issues with optimization and performance because it is can be hard to control where the compiler inserts type checks. I just manually call valid-word? when necessary.

To read a word file, we read each line in the file, trim it, and select only the valid words. This works on the canonical word files, but you can read words from the system dictionary or other places if you want.

(defun read-word-file (pathname)
  (collect 'bag
    (choose-if #'valid-word?
      (map-fn 'string #'str:trim
        (scan-file pathname #'read-line)))))

(defparameter +word-file+ "~/wordle/words.txt")
(defparameter +answer-file+ "~/wordle/answers.txt")

(defparameter +all-words+ (read-word-file +word-file+))
(defparameter +all-answers+ (read-word-file +all-answers+))

We need to score a guess. When you make a guess, the squares under the letters turn green if the letter is correct, yellow if the letter is incorrect, but appears in the answer, and gray if the letter does not appear in the answer. We'll just return a list of the colors (as keywords). For example, (score-guess "react" "relay") => (:green :green :yellow :gray :gray)

score-guess first needs a list of the letters in the answer that don't match the guess:

(let ((sg (scan 'string guess))
      (sa (scan 'string answer)))
  (collect 'bag
    (choose (map-fn 'boolean #'char/= sg sa) sa)))

then we walk the guess. If the guess character equals the answer character, we cons a :green on to the score. If the guess character is a member of the unmatched answer characters, we cons a :yellow on to the score and delete that character from the unmatched characters. Otherwise, we cons a :gray on to the score.

(defun score-guess (guess answer)
  (declare (type word guess answer))
  (let walk ((index 0)
             (score '())
             (unmatched-chars (let ((sg (scan 'string guess))
                                    (sa (scan 'string answer)))
                                (collect 'bag
                                  (choose (map-fn 'boolean #'char/= sg sa) sa)))))
    (if (>= index 5)
        (nreverse score)
        (let ((guess-char (schar guess index)))
          (cond ((char= guess-char (schar answer index))
                 (walk (1+ index) (cons :green score) unmatched-chars))
                ((member guess-char unmatched-chars)
                 (walk (1+ index) (cons :yellow score) (delete guess-char unmatched-chars)))
                (t
                 (walk (1+ index) (cons :gray score) unmatched-chars)))))))

Once we've made a guess and have a score, we'll want to narrow down the possible words. We just go over the word list and keep the words that have a matching score.

(defun prune-words (guess score words)
  (declare (optimizable-series-function) (off-line-port words))
  (choose-if
   (lambda (word)
     (equal (score-guess guess word) score))
   words))

We'll need a strategy for picking a word to guess. Here's an easy, naive one to start with: if there is only one possible word left, guess that one, otherwise guess a completely random word and narrow down the possibility list.

(defun strategy/random-word (possibilities)
  (if (= (length possibilities) 1)
      (car possibilities)
      (random-word +all-words+)))

So let's imagine the top level. The play function will play a single round of Wordle. We'll be keeping track of the possible words as we go. We choose a guess based on our strategy, then score the guess. If we got the right answer, we're done, but otherwise we narrow down the list of possibilites to those that have the same score and play the next round.

(defun play (strategy &optional (round 1)
                        (possibilities +all-answers+)
                        (secret-word (random-word +all-answers+)))
  (let* ((guess (funcall strategy possibilities))
         (score (score-guess guess secret-word)))
    (format t "~&~d guessing ~s ~s ..." round guess score)
    (if (equal score '(:green :green :green :green :green))
        (progn (format t "correct.") round)
        (let ((new-possibilities
                (collect 'bag (prune-words guess score (scan 'list possibilities)))))
          (format t "narrowed to ~d possibilities." (length new-possibilities))
          (play strategy (+ round 1) new-possibilities secret-word)))))

WORDLE> (play #'strategy/random-word)

1 guessing "culty" (:GRAY :GRAY :GRAY :GRAY :GRAY) ...narrowed to 519 possibilities.
2 guessing "hings" (:GRAY :GRAY :GRAY :GRAY :GRAY) ...narrowed to 101 possibilities.
3 guessing "india" (:GRAY :GRAY :YELLOW :GRAY :GRAY) ...narrowed to 9 possibilities.
4 guessing "lauds" (:GRAY :GRAY :GRAY :YELLOW :GRAY) ...narrowed to 8 possibilities.
5 guessing "stedd" (:GRAY :GRAY :GRAY :GRAY :GREEN) ...narrowed to 2 possibilities.
6 guessing "khets" (:GRAY :GRAY :GRAY :GRAY :GRAY) ...narrowed to 2 possibilities.
7 guessing "bared" (:GREEN :GRAY :YELLOW :GRAY :GREEN) ...narrowed to 1 possibilities.
8 guessing "brood" (:GREEN :GREEN :GREEN :GREEN :GREEN) ...correct.
8

It plays Wordle. Not very well, but it plays. This strategy seems to average a bit more than seven guesses a game. A better strategy should reduce this average.

When you guess a word, you divide the space of possible answers into a set of equivalence classes by score. I picture these as a set of bins, each labeled with a different score, like (:green :gray :gray :yellow :gray). Making a guess divides the list of possible words among the bins. A bad guess will only use a few bins and have uneven bins. A good guess will use a larger set of bins and divide things more evenly.

We'll need a function to collect the counts of an item in a series

(defun collect-counts (test items)
  (declare (optimizable-series-function))
  (collect-fn t
              (lambda () (make-hash-table :test test))
              (lambda (table item)
                (incf (gethash item table 0))
                table)
              items))

So now we go through a series of words, score the guess against each one, and count how many times we get each score.

(defun partition-words (guess words)
  (declare (optimizable-series-function))
  (collect-counts 'equal
                  (map-fn 'list #'score-guess
                          (series guess)
                          words)))

This returns a hash table that maps scores to the number of words matching that score. We need to measure how good a job this table does at narrowing down the word list.

We'll need a couple of helpers:

(defun weighted-sum (weights elements)
  (declare (optimizable-series-function))
  (collect-sum (map-fn 'real #'* weights elements)))

(defun scan-hash-values (hash-table)
  (declare (optimizable-series-function))
  (multiple-value-bind (keys values) (scan-hash hash-table)
    (declare (ignore keys))
    values))

Now we have to decide how to evaluate how well a partition (set of bins) narrows down possible word list. Suppose our word list originally had 128 words. That's 2⁷ items, so it would take seven binary splits to single out a word. Now suppose after narrowing, we find we're down to 16 words. That's 2⁴ items, so the narrowing is equivalent to three binary splits. The value of an entire set of bins is the weighted average of the narrowing of each bin.

(defun evaluate-partition1 (partition)
  (let* ((original-size (collect-sum (scan-hash-values partition)))
         (original-bits (log2 original-size)))

    (flet ((info-gain (bin-size)
             (- original-bits (log2 bin-size)))

           (weight (bin-size)
             (/ (coerce bin-size 'double-float)
                (coerce original-size 'double-float))))

      (let ((bin-sizes (scan-hash-values partition)))
        (weighted-sum
         (map-fn 'real #'weight bin-sizes)
         (map-fn 'real #'info-gain bin-sizes))))))

(defun evaluate-guess (guess possibilities)
  (evaluate-partition (partition-words guess (scan 'list possibilities))))

(defun best-guess (guesses possibilities)
  (best #'> guesses :key (lambda (guess) (evaluate-guess guess possibilities))))

WORDLE> (play #'strategy/best-word)

1 guessing "soare" (:GRAY :GREEN :GRAY :GRAY :GRAY) ...narrowed to 87 possibilities.
2 guessing "culty" (:GRAY :GRAY :YELLOW :GRAY :GRAY) ...narrowed to 1 possibilities.
3 guessing "login" (:GREEN :GREEN :GREEN :GREEN :GREEN) ...correct.
3

With this strategy, we seem to average about 3.5 guess per game. This is much better than the tad over 7 we had before.

Sunday, July 24, 2022

Named Lambda and Named Let

Suppose you want to map the "add three" function over a list. You don't need to define a name for the function, you can just use a lambda expression: (lambda (n) (+ n 3)). This creates an anonymous "add three" function.

Now suppose you want to map the "factorial" function over a list. You start with (lambda (x) (if (zerop x) 1 ... , but how can you recursively call an anonymous function? We need the function to have a local name within its own body. One option is to use the Y operator:

* (map 'list (Y (lambda (fact)
                  (lambda (x)
                    (if (zerop x)
                        1
                        (* x (funcall fact (1- x)))))))
       '(3 5 7))
(6 120 5040)

but another popular option is to provide a new special form

* (map 'list (named-lambda fact (x)
               (if (zerop x)
                   1
                   (* x (fact (1- x)))))
     '(3 5 7))
(6 120 5040)

The name fact is bound to the lambda expression only within the body of the lambda expression. You don't need to defun a factorial procedure, you can just use a named-lambda.

A little puzzle for you: write named-lambda. My answer below.

Just as a let is syntactic sugar for a lambda application, a named-let is syntactic sugar for a named-lambda application. The name is bound to the lambda expression that performs the variable binding, so you can use that name to make a recursive call. In effect, you can re-invoke the named-let expression with a fresh set of values.

Scheme hackers will be familiar with named-let, but it isn't usual in Common Lisp. It's an easy transformation:

(named-let recur ((x '(a list))
                  (y 22))    
   (body)) =>

(funcall (named-lambda recur (x y) (body)) '(a list) 22)

named-let is the bee's knees for ad hoc iteration. The iteration variables are bound to their initial values by the named-let, and the body can initiate the next iteration by tail-calling the named-let with the updated values for the iteration variables. Since there is no constraint on where or when you iterate, or how you update the iteration variables, this allows very general iteration.

I have seen a tendency for Scheme hackers to overdo it with named lets. You don't need a named-let when map will do. It's usually better to express an iteration through a higher order function than to write yet another ad hoc loop in your code. But named-let is a useful and powerful tool when the simpler methods aren't cutting it.

Here's what I came up with for named-lambda:

(defmacro named-lambda (name (&rest arglist) &body body)
  `(labels ((,name ,arglist ,@body))
     (function ,name)))

Saturday, July 16, 2022

Let's talk to GitHub

Let's teach Common Lisp to talk to GitHub.

We'll need an API token. I like to put these sorts of things in config files. This makes it easier to configure scripts that are deployed to containers. You simply make the config files available through a mount point when starting the container. That way, you can avoid baking credentials into the script.

(defun config-directory ()
  (merge-pathnames
    (make-pathname :directory '(:relative ".config" "github"))
    (user-homedir-pathname)))

(defun config-file (&rest keyargs)
  (merge-pathnames (apply #'make-pathname keyargs) (config-directory)))

(defun load-token (pathname)
  (with-open-file (stream pathname :direction :input)
    (str:trim (read-line stream))))

(defun github-api-token ()
  (load-token (config-file :name "api-token")))

We'll make a lot of use miscellaneous, ad hoc CLOS objects. It is so common for these things to have names that it is worth its own mixin.

(defgeneric get-name (object))

(defclass named-object-mixin ()
  ((name :initarg :name
         :initform (require-initarg :name)
         :reader get-name
         :type string)))

And we'll define a default print-object method. Classes that use this mixin and don't provide their own print-object method will get this one by default.

(defmethod print-object ((obj named-object-mixin) stream)
  (print-unreadable-object (obj stream :identity t :type t)
    (format stream "~a" (slot-value obj 'name))))

We'll make an object to represent GitHub and put the API token in there.

(defclass github (named-object-mixin)
  ((api-token :initarg :api-token
              :initform (require-initarg :api-token)
              :reader get-api-token)))

(defparameter +github+ nil)

(defun github ()  
  (unless (and (boundp '+github+)
               (symbol-value '+github+))
    (setf (symbol-value '+github+)
          (make-instance 'github 
                         :name "GitHub"
                         :api-token (github-api-token))))
  (symbol-value '+github+))

To authenticate to GitHub, we need to pass the API token in the HTTP request headers.

(defun authorization-header (github)
  (cons "Authorization" (format nil "token ~a" (get-api-token github))))

So let's make a request:

* (dex:get "https://api.github.com/user"
           :headers (list (authorization-header (github))
                          '("Accept" . "application/vnd.github.v3+json")))
"{"login":"joseph-marshall69","id":60371090,"node_id":"MDQ6VXNlcjYwMzcxMDkw","avatar_url":"https://avatars.githubusercontent.com/u/60371090?v=4","gravatar_id":"","url":"https://api.github.com/users/jos...[sly-elided string of length 1535]"
200 (8 bits, #xC8, #o310, #b11001000)
#<HASH-TABLE :TEST EQUAL :COUNT 26 {1002A9B2A3}>
#<QURI.URI.HTTP:URI-HTTPS https://api.github.com/user>
#<CL+SSL::SSL-STREAM for #<FD-STREAM for "socket 172.26.126.123:33674, peer: 192.30.255.116:443" {1002A96333}>>

Success! But we got back the string representation of a JSON object. We'll instead request a stream as a return value and pass it to a JSON parser:

* (json:decode-json
    (dex:get "https://api.github.com/user"
             :headers (list (authorization-header (github))
                            '("Accept" . "application/vnd.github.v3+json"))
             :want-stream t))

((:LOGIN . "jrm-code-project") (:ID . 51824598)
 (:NODE--ID . "MDQ6VXNlcjUxODI0NTk4")
 (:AVATAR--URL . "https://avatars.githubusercontent.com/u/51824598?v=4")
 (:GRAVATAR--ID . "") (:URL . "https://api.github.com/users/jrm-code-project")
 (:HTML--URL . "https://github.com/jrm-code-project")
 (:FOLLOWERS--URL . "https://api.github.com/users/jrm-code-project/followers")
 (:FOLLOWING--URL
  . "https://api.github.com/users/jrm-code-project/following{/other_user}")
 (:GISTS--URL
  . "https://api.github.com/users/jrm-code-project/gists{/gist_id}")
 (:STARRED--URL
  . "https://api.github.com/users/jrm-code-project/starred{/owner}{/repo}")
 (:SUBSCRIPTIONS--URL
  . "https://api.github.com/users/jrm-code-project/subscriptions")
 (:ORGANIZATIONS--URL . "https://api.github.com/users/jrm-code-project/orgs")
 (:REPOS--URL . "https://api.github.com/users/jrm-code-project/repos")
 (:EVENTS--URL
  . "https://api.github.com/users/jrm-code-project/events{/privacy}")
 (:RECEIVED--EVENTS--URL
  . "https://api.github.com/users/jrm-code-project/received_events")
 (:TYPE . "User") (:SITE--ADMIN) (:NAME . "Joe Marshall") (:COMPANY)
 (:BLOG . "https://sites.google.com/site/evalapply/")
 (:LOCATION . "Seattle, WA") (:EMAIL) (:HIREABLE) (:BIO) (:TWITTER--USERNAME)
 (:PUBLIC--REPOS . 9) (:PUBLIC--GISTS . 0) (:FOLLOWERS . 21) (:FOLLOWING . 0)
 (:CREATED--AT . "2019-06-14T12:33:06Z")
 (:UPDATED--AT . "2022-03-15T15:18:03Z") (:PRIVATE--GISTS . 0)
 (:TOTAL--PRIVATE--REPOS . 0) (:OWNED--PRIVATE--REPOS . 0)
 (:DISK--USAGE . 44815) (:COLLABORATORS . 0) (:TWO--FACTOR--AUTHENTICATION)
 (:PLAN (:NAME . "free") (:SPACE . 976562499) (:COLLABORATORS . 0)
  (:PRIVATE--REPOS . 10000)))

JSON objects are mapped to alists. The key is a little funny because of how the JSON parser encodes JSON keys with underscores.

An alist is sort of a poor man's object. The problem with an alist is that there is no type associated with it. We know the slots in our poor man's object, but we don't know the class. Without the class information, we don't have a predicate or a way to dispatch to methods. We should create a real CLOS object from this JSON.

(defclass user (named-object-mixin)
  ((login :initarg :login)
   (id :initarg :id)
   (node-id :initarg :node--id)))

(defun json->user-instance (json)
  (apply #'make-instance 'user
         :allow-other-keys t
         (alist->plist json)))

Should we want to bring more fields into Lisp, we need simply add slots with the right initargs to the user class.

Now we can write

(defun get-self (github)
  (json->user-instance
    (json:decode-json
      (dex:get "https://api.github.com/user"
               :headers (list (authorization-header github)
                              '("Accept" . "application/vnd.github.v3+json"))
               :want-stream t))))

* (get-self (github))
#<USER Joe Marshall {1002BE7013}>

* (inspect *)

The object is a STANDARD-OBJECT of type USER.
0. NAME: "Joe Marshall"
1. LOGIN: "jrm-code-project"
2. ID: 51824598
3. NODE-ID: "MDQ6VXNlcjUxODI0NTk4"

GitHub is moving to a GraphQL API. That's easy to handle.


(defun graphql-query (github query &rest variables)
  (let ((content (json:encode-json-to-string
                  `((query . ,query)
                    (variables . ,(plist->alist variables))))))
    (let* ((json (json:decode-json
                  (dex:post "https://api.github.com/graphql"
                            :headers (list (authorization-header github)
                                           '("Accept" . "application/vnd.github.v3+json")
                                           '("Content-Type" . "application/json"))
                            :content content
                            :want-stream t)))
           (errors (cdr (assoc :errors json))))
      (when errors
        (let ((first-error (car errors)))
          (error (cdr (assoc :message first-error)))))
      (cdr (assoc :data json)))))

(defparameter +get-user-by-login-query+
  "query ($login: String!) {
     user (login: $login) {
       databaseId
       login
       name
       }
    }")
    
* (graphql-query (github) +get-user-by-login-query+ :login "jrm-code-project")
((:USER (:DATABASE-ID . 51824598) (:LOGIN . "jrm-code-project")
  (:NAME . "Joe Marshall")))

And you can use the above technique to turn this JSON into a CLOS instance.

At this point we're cooking. We can call GitHub from Common Lisp and get CLOS objects in return. Of course we need more calls other than get-user, but it's more of the same. With this layer as our basis, it is straightforward to script GitHub.

Thursday, July 7, 2022

Series tips and tricks

I'm an aficionado of the Common Lisp series package. Here are a couple of tricks and tips.

The series package works by walking your code at macro expansion time. It is able to perform better optimizations if it can walk the Common Lisp forms DEFUN, FUNCALL, LET, LET*, and MULTIPLE-VALUE-BIND. This is easily done in a defpackage

(defpackage "MY-PACKAGE"
  (:shadowing-import-from "SERIES"
     "DEFUN" "FUNCALL" "LET" "LET*" "MULTIPLE-VALUE-BIND")
  (:use "COMMON-LISP" "SERIES"))

series will still mostly work if you don't shadowing-import these symbols, but it will miss some important optimizations. I mention this tip because I've seen people have problems by omitting this import.

series was designed to handle basic linear iteration, and it's quite good at that, but you'll really hit a wall if you try anything fancier. But I had two good cases where I wanted to push the limits. One is tree traversal. I wanted the series of nodes in a tree. This is easily done recursively, but series is going to generate an iterative loop.

We can eliminate recursion by making a state machine with a stack. We'll make a state machine that can walk the tree and create a series out of all the states it occupies as it walks. series provides us with a primitive just for this purpose:

  (scan-fn type initial-state-thunk
                state-transition-function
                final-state-p)

To traverse a tree, the initial state is just our stack. The state transition function pops the stack and pushes the children of the popped node (if any). We're done when the stack is empty.

(defun tree-walker-states (root-node node-children)
  (declare (optimizable-series-function))
  (scan-fn 'list (lambda () (list root-node))
                 (lambda (stack)
                   (append (funcall node-children (car stack)) (cdr stack)))
                 #'null))

The node being visited at each step is the one at the top of the stack, so scan-tree is simply this:

(defun scan-tree (root-node node-children)    
  (declare (optimizable-series-function))
  (map-fn 't #'car (tree-walker-states root-node node-children)))

Let's try it.

(defun pathname-children (pathname)
  (and (uiop:directory-pathname-p pathname)
       (directory (make-pathname :name :wild
                                 :type :wild
                                 :defaults pathname))))

(defun scan-directory (pathname)
  (declare (optimizable-series-function))
  (scan-tree pathname #'pathname-children))

> (collect-nth 240 (scan-directory "~/.m2/"))
#P"/home/jrm/.m2/repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.9.7/jackson-module-parameter-names-2.9.7.pom"

You can't tell from the output, but series pipelined this form. It didn't create an in-memory data structure with hundreds of thousands of pathnames just to select the 240th. It only walked the tree as far as it needed to find the 240th element.

The second case where I want to push the limits is in processing REST API calls. Some API calls return lists of items and it's nice to return them as a series. But API calls that return lists usually want to paginate the results, so we need a way to iterate over the pages while iterating over the items within the pages.

A trick similar to the previous one works. We keep a queue of items to yield and at each state transition we dequeue one item. At the end of the queue is a special item that indicates that we need to fetch the next page.

Here's how we put this together. We use scan-fn to drive the loop, we start by pushing the marker to fetch the first page, and we're done when the queue is empty:

(scan-fn 't (lambda () (list (cons :page 1)))
                 ...
                 #'null)

The state transition function examines the head of the queue. If it isn't a special marker, we just continue with the tail of the queue:

(lambda (queue)
  (let ((head (car queue)))
    (if (and (consp head)
             (eq (car head) :page))
        ...
        (cdr queue))))

If it is a :page marker, we fetch a page of elements:

(let* ((page-number (cdr head))
       (paged-url   (format nil "~a~[~;~:;?page=~:*~d~]" base-url page-number)))
   (multiple-value-bind (stream code reply-headers)
       (dex:get paged-url
                :headers request-headers
                :want-stream t)
     ...))

And we construct the new queue. The elements are decoded from the stream, but if we have more pages, we append the next page marker to the list of elements:

(append (json:decode-json stream)
        (let ((link (gethash "link" reply-headers)))
          (when (and link (search "rel=\"next\"" link))
            (list (cons :page (+ page-number 1))))))

So our scan-fn call will create the series of state-machine states. We map-fn car over the states to get a series of page elements and the occasional next-page page marker. Finally, we use choose-if to discard the page markers.

(defun scan-paged-api (base-url request-headers)
  (declare (optimizable-series-function))
  (choose-if
   (lambda (item)
     (not (and (consp item)
               (eq (car item) :page))))
   (map-fn t #'car
     (scan-fn 'list
       (lambda () (list (cons :page 1)))
       (lambda (stack)
         (let ((top (car stack)))
           (if (and (consp top)
                    (eq (car top) :page))
               (let* ((page-number (cdr top))
                      (paged-url (format nil "~a~[~;~:;?page=~:*~d~]" base-url page-number)))
                 (multiple-value-bind (stream code reply-headers)
                     (dex:get paged-url
                              :headers request-headers
                              :want-stream t)
                   (assert (= code 200))
                   (append (json:decode-json stream)
                           (let ((link (gethash "link" reply-headers)))
                             (when (and link (search "rel=\"next\"" link))
                               (list (cons :page (+ page-number 1))))))))
               (cdr stack))))
      #'null))))

That's pretty complex. What's the payoff? Later code becomes much, much simpler. Here's an example. I have a variation on the above code that is specialized for GitHub. I can, for example, write a loop that iterates through the repos in an org, filters them by name, and collects the matching ones into a list like this:

(collect 'list
  (choose-if (lambda (repo)
               (cl-ppcre:scan regex (get-name repo)))
             (scan-org-repos org)))

Now that's pretty simple and hard to get wrong.