## Monday, September 22, 2014

### A couple more homographic function tricks

A generalized continued fraction is an expression of the form:
You can partly apply a homographic function to a generalized continued fraction if you have a stream of the ai and bi
```(define (partly-apply-general hf nums dens)
(let ((a (first  hf))
(b (second hf))
(c (third  hf))
(d (fourth hf)))
(if (empty-stream? nums)
(values (list a a
c c)
nums
dens)
(values (list (+ (* a den) (* b num)) a
(+ (* c den) (* d num)) c)
(tail nums)
(tail dens))))))

(define (print-hf-general hf nums dens)
(call-with-values (lambda () (partly-evaluate hf))
(lambda (term hf*)
(if (not term)
(call-with-values (lambda () (partly-apply-general hf nums dens))
print-hf-general)
(begin
(display term)
;; take reciprocal and multiply by 10
(let ((a (first hf*))
(b (second hf*))
(c (third hf*))
(d (fourth hf*)))
(print-hf-general (list (* c 10) (* d 10)
a        b)
nums dens)))))))
```
For example, we can compute pi from this generalized continued fraction:
```(print-hf-general (list 0 4 1 0)
;; [1 1 4 9 16 25 ...]
(cons-stream 1
(let squares ((i 1))
(cons-stream (* i i) (squares (1+ i)))))
;; [1 3 5 7 9 11 ...]
(let odds ((j 1))
(cons-stream j (odds (+ j 2)))))

314159265358979323846264338327950^G
; Quit!
```
A bi-homographic function is a function of the form:
```(define (bi-homographic a b c d e f g h)
(lambda (x y)
(/ (+ (* a x y) (* b x) (* c y) d)
(+ (* e x y) (* f x) (* g y) h))))
```
Like a homographic function, you can partly evaluate a bi-homographic function and generate a continued fraction. You can also partly apply a bi-homographic function to a pair of continued fractions. When you do this, you have a choice of which continued fraction to be the object of the partial application. There's about twice as much nasty math involved, but the gist is that a bi-homographic function takes two continued fractions as arguments and produces one continued fraction as a result.

It turns out that addition, subtraction, multiplication and division are bi-homographic functions, so one can incrementally compute sums and products of continued fractions.

## Thursday, September 18, 2014

### A useful, if somewhat pointless, trick with homographic functions

In my previous posts I showed that if you are applying a homographic function to a continued fraction, you can partly evaluate the answer before you completely apply the function. Instead of representing homographic functions as lambda expressions, I'll represent them as a list of the coefficients `a`, `b`, `c`, and `d` in `(lambda (t) (/ (+ (* a t) b) (+ (* c t) d)))`. I'll represent a simple continued fraction as a stream of the integer terms in the denominators.
Here is how you partly apply a homographic function to a continued fraction:
```(define (partly-apply hf cf)
(let ((a (first  hf))
(b (second hf))
(c (third  hf))
(d (fourth hf)))
(if (empty-stream? cf)
(values (list a a
c c)
cf)
(values (list (+ (* a term) b) a
(+ (* c term) d) c)
(tail cf))))))```
Partly evaluating a homographic function involves looking at the limits of the function as `t` starts at 1 and goes to infinity:
```(define (partly-evaluate hf)
(let ((a (first hf))
(b (second hf))
(c (third hf))
(d (fourth hf)))

(if (and (same-sign? c (+ c d))
(let ((limit1 (quotient      a       c))
(limit2 (quotient (+ a b) (+ c d))))
(= limit1 limit2)))
(let ((term (quotient a c)))
(let ((new-c (- a (* c term)))
(new-d (- b (* d term))))
(values term (list c d new-c new-d))))
(values #f #f))))
```
We can combine these two steps and make something useful. For example, we can print the value of applying a homographic function to a continued fraction incrementally, printing the most significant digits before computing further digits.
```(define (print-hf-cf hf cf)
(call-with-values (lambda () (partly-evaluate hf))
(lambda (term hf*)
(if (not term)
(call-with-values (lambda () (partly-apply hf cf))
print-hf-cf)
(begin
(display term)
;; take reciprocal and multiply by 10
(let ((a (first hf*))
(b (second hf*))
(c (third hf*))
(d (fourth hf*)))
(print-hf-cf (list (* c 10) (* d 10)
a        b)
cf)))))))```
But how often are you going to apply a homographic function to a continued fraction? Fortunately, the identity function is homographic (coefficients are 1 0 0 1), so applying it to a continued fraction doesn't change the value. The square root of 2 is a simple continued fraction with coefficients [1 2 2 2 ...] where the 2s repeat forever. We apply the identity homographic function to it and print the results:
```(printcf (list 1 0 0 1) sqrt-two)
14142135623730950488016887242096980785696718^G
; Quit!```
As you can see, we start printing the square root of two and we don't stop printing digits until the user interrupts.

A fancier version could truncate the output and print a decimal point after the first iteration.

## Friday, September 5, 2014

### Another stupid homographic function trick

In my last post I showed that if you take a homographic function and apply it to a fraction, you can partly apply the function to the integer part of the fraction and get a new homographic function. The new function can be applied to the non-integer part of the fraction to generate an answer equivalent to the original function applied to the original fraction.

It turns out that you can go in the other direction as well. You can partly evaluate a homographic function. For example, consider this homographic function:
```((lambda (t)
(/ (+ (* 70 t) 29)
(+ (* 12 t)  5))) n)```
Which we intend to apply to some positive number `n`. Even if all we know is that `n` is positive, we can deduce that the value of the homographic function is between 29/5 (when `t` is 0) and 70/12 (as `t` goes to infinity). The integer part of those values are the same, so we can factor that out:
```(+ 5 (/ 1 ((lambda (t)
(/ (+ (* 12 t) 5)
(+ (* 10 t) 4))) n)))```
The partial answer has an integer value of 5 and a fractional part that contains a new homographic function applied to our original `n`. We can do it again:
```(+ 5 (/ 1
(+ 1 (/ 1
((lambda (t)
(/ (+ (* 10 t) 4)
(+ (* 2 t) 1))) n)))))```
The fractional part of the answer can itself be factored into another integer and a new homographic function applied to our original `n`.

## Wednesday, September 3, 2014

### Stupid homographic function trick

A function of the form
is called a homographic function.  Here is one in Scheme:
```(lambda (t)
(/ (+ (* 3 t) 4)
(+ (* 5 t) 2)))```
And here is what it's graph looks like:
If you multiply all the coefficients (`a`, `b`, `c`, and `d`) by the same number, it doesn't change the function. For instance, this homographic function:
```(lambda (t)
(/ (+ (* 21 t) 28)
(+ (* 35 t) 14)))```
is the same as the one above. If one of your coefficients isn't an integer, don't worry, you can multiply everything by the denominator and get an equivalent homographic function. On the other hand, you can divide all your coefficients by their greatest common divisor and get an equivalent homographic function with smaller coefficients. We'll keep our homographic functions in smallest integer form.

A rational number can be written as the sum of an integer and a fraction less than one. For example, 23/5 = 4 + 3/5.

Let's apply a homographic function to a rational number:
```((lambda (t)
(/ (+ (* a t) b)
(+ (* c t) d))) (+ x y/z))

;; substitute
(/ (+ (* a (+ x y/z)) b)
(+ (* c (+ x y/z)) d))

;; distribute the multiplication
(/ (+ (* a x) (* a y/z) b)
(+ (* c x) (* c y/z) d))

;; multiply top and bottom by z/y
(/ (* z/y (+ (* a x) (* a y/z) b))
(* z/y (+ (* c x) (* c y/z) d)))

;; distribute the multiplication
(/ (+ (* a x z/y) (* a y/z z/y) (* b z/y))
(+ (* c x z/y) (* c y/z z/y) (* d z/y)))

;; simplify
(/ (+ (* a x z/y) a (* b z/y))
(+ (* c x z/y) c (* d z/y)))

;; rearrange terms
(/ (+ (* a x z/y) (* b z/y) a)
(+ (* c x z/y) (* d z/y) c))

;; factor out z/y
(/ (+ (* (+ (* a x) b) z/y) a)
(+ (* (+ (* c x) d) z/y) c))

```
Now we do something tricky. We abstract out the `z/y` term:
```((lambda (t)
(/ (+ (* (+ (* a x) b) t) a)
(+ (* (+ (* c x) d) t) c))) (/ z y))
```
If introduce a `let` form, we can see something interesting:
```((lambda (t)
(let ((a1 (+ (* a x) b))
(b1 a)
(c1 (+ (* c x) d))
(d1 c))
(/ (+ (* a1 t) b1)
(+ (* c1 t) d1)))) (/ z y))
```
We find a new homographic function being applied to a new rational number. The new homographic function has coefficients related to the original one, and the new rational number is the reciprocal of the fractional part of the original rational number. So if we have a homographic function `hf` applied to a fraction of the form `x + y/z`, we can easily find a new homographic function `hf'` that when applied to `z/y` will produce the same answer as the original. Applying a homographic function to a fraction has the effect of "eating" the integer part of the fraction, which generates a new homographic function that is applied to the reciprocal of the fractional part.

## Wednesday, August 27, 2014

### A use of Newton's method

I've seen more than one book claim that computing with real numbers inevitably involves round-off errors because real numbers can have an infinite number of digits after the decimal point and no finite representation can hold them. This is false. Instead of representing a real number as a nearby rational number with an error introduced by rounding, we'll represent a real number as computer program that generates the digits. The number of digits generated is potentially infinite, but the program that generates them is definitely finite.

Here is Gosper's algorithm for computing the square root of a rational number.
```(define (gosper-sqrt a b c d)
;;   Solve for
;; ax + b
;; ------  = x
;; cx + d
(define (newtons-method f f-prime guess)
(let ((dy (f guess)))
(if (< (abs dy) 1)
guess
(let ((dy/dx (f-prime guess)))
(newtons-method f f-prime (- guess (/ dy dy/dx)))))))

(define (f x)
(+ (* c x x)
(* (- d a) x)
(- b)))

(define (f-prime x)
(+ (* 2 c x)
(- d a)))

(let ((value (floor (newtons-method f f-prime b))))
(cons-stream value
(gosper-sqrt (+ (* c value) d)
c
(+ (* (- a (* value c)) value)
(- b (* value d)))
(- a (* value c))))))

1 ]=> (cf:render (gosper-sqrt 0 17 10 0))
1.303840481040529742916594311485836883305618755782013091790079369...

;; base 10, 100 digits
1 ]=> (cf:render (gosper-sqrt 0 17 10 0) 10 100)
1.303840481040529742916594311485836883305618755782013091790079369896765385576397896545183528886788497...
```

## Tuesday, August 26, 2014

### Solutions in search of problems

Suppose you have a function like `(define foo (lambda (x) (- (* x x x) 30)))` and you want to find `x` such that `(foo x)` = `0`. There are a few ways to go about this. If you can find two different `x` such that `(foo x)` is positive for one and negative for the other, then `(foo x)` must be zero somewhere in between. A simple binary search will find it.
```(define (bisection-method f left right)
(let* ((midpoint (average left right))
(fmid     (f midpoint)))
(if (< (abs fmid) 1e-8)
midpoint
(let ((fl (f left))
(fr (f right)))
(cond ((same-sign? fl fr) (error "Left and right not on opposite sides."))
((same-sign? fmid fr) (bisection-method f left midpoint))
((same-sign? fl fmid) (bisection-method f midpoint right))
(else (error "shouldn't happen")))))))

(define (average l r) (/ (+ l r) 2))

(define (same-sign? l r)
(or (and (positive? l)
(positive? r))
(and (negative? l)
(negative? r))))

1 ]=> (cos 2)

;Value: -.4161468365471424

1 ]=> (cos 1)

;Value: .5403023058681398

1 ]=> (bisection-method cos 1.0 2.0)
1. 2.
1.5 2.
1.5 1.75
1.5 1.625
1.5625 1.625
1.5625 1.59375
1.5625 1.578125
1.5703125 1.578125
1.5703125 1.57421875
1.5703125 1.572265625
1.5703125 1.5712890625
1.5703125 1.57080078125
1.570556640625 1.57080078125
1.5706787109375 1.57080078125
1.57073974609375 1.57080078125
1.570770263671875 1.57080078125
1.5707855224609375 1.57080078125
1.5707931518554687 1.57080078125
1.5707931518554687 1.5707969665527344
1.5707950592041016 1.5707969665527344
1.570796012878418 1.5707969665527344
1.570796012878418 1.5707964897155762
1.570796251296997 1.5707964897155762
1.570796251296997 1.5707963705062866
1.5707963109016418 1.5707963705062866
1.5707963109016418 1.5707963407039642
;Value: 1.570796325802803```
Rather than selecting the midpoint between the two prior guesses, you can pretend that your function is linear between the guesses and interpolate where the zero should be. This can converge quicker.
```(define (secant-method f x1 x2)
(display x1) (display " ") (display x2) (newline)
(let ((f1 (f x1))
(f2 (f x2)))
(if (< (abs f1) 1e-8)
x1
(let ((x0 (/ (- (* x2 f1) (* x1 f2))
(- f1 f2))))
(secant-method f x0 x1)))))

1 ]=> (secant-method cos 0.0 4.0)
0. 4.
2.418900874126076 0.
1.38220688493168 2.418900874126076
1.5895160570280047 1.38220688493168
1.5706960159120333 1.5895160570280047
1.5707963326223677 1.5706960159120333
;Value: 1.5707963326223677
```
If you know the derivative of `f`, then you can use Newton's method to find the solution.
```(define (newtons-method f f-prime guess)
(display guess) (display " ") (newline)
(let ((dy (f guess)))
(if (< (abs dy) 1e-8)
guess
(let ((dy/dx (f-prime guess)))
(newtons-method f f-prime (- guess (/ dy dy/dx)))))))

1 ]=> (newtons-method cos (lambda (x) (- (sin x))) 2.0)
2.
1.5423424456397141
1.5708040082580965
1.5707963267948966
;Value: 1.5707963267948966```
Here's another example. We'll find the cube root of 30 by solving `(lambda (x) (- (* x x x) 30))`.
```(define (cube-minus-thirty x) (- (* x x x) 30))

1 ]=> (bisection-method cube-minus-thirty 0.0 4.0)
0. 4.
2. 4.
3. 4.
3. 3.5
3. 3.25
3. 3.125
3.0625 3.125
3.09375 3.125
3.09375 3.109375
3.1015625 3.109375
3.10546875 3.109375
3.10546875 3.107421875
3.1064453125 3.107421875
3.10693359375 3.107421875
3.107177734375 3.107421875
3.107177734375 3.1072998046875
3.107177734375 3.10723876953125
3.107208251953125 3.10723876953125
3.1072235107421875 3.10723876953125
3.1072311401367187 3.10723876953125
3.1072311401367187 3.1072349548339844
3.1072311401367187 3.1072330474853516
3.107232093811035 3.1072330474853516
3.107232093811035 3.1072325706481934
3.1072323322296143 3.1072325706481934
3.107232451438904 3.1072325706481934
3.107232451438904 3.1072325110435486
3.107232481241226 3.1072325110435486
3.1072324961423874 3.1072325110435486
3.107232503592968 3.1072325110435486
3.107232503592968 3.1072325073182583
3.107232505455613 3.1072325073182583
3.107232505455613 3.1072325063869357
;Value: 3.1072325059212744

1 ]=> (secant-method cube-minus-thirty 0.0 4.0)
0. 4.
1.875 0.
8.533333333333333 1.875
2.1285182547245376 8.533333333333333
2.341649751209406 2.1285182547245376
3.4857887202177547 2.341649751209406
3.0068542655016235 3.4857887202177547
3.0957153766467633 3.0068542655016235
3.1076136741672546 3.0957153766467633
3.1072310897513415 3.1076136741672546
3.1072325057801455 3.1072310897513415
;Value: 3.1072325057801455

1 ]=> (define (cube-minus-thirty-prime x) (* 3 x x))

1 ]=> (newtons-method cube-minus-thirty cube-minus-thirty-prime 4.0)
4.
3.2916666666666665
3.1173734622300557
3.10726545916981
3.1072325063033337
3.107232505953859
;Value: 3.107232505953859```

## Friday, August 22, 2014

### Small puzzle solution

Before I give my solution, I'd like to describe the leftmost digit algorithm in a bit more detail.
```(define (leftmost-digit base n)
(if (< n base)
n
(let ((leftmost-pair (leftmost-digit (* base base) n)))
(if (< leftmost-pair base)
leftmost-pair
(quotient leftmost-pair base)))))```
The idea is this: if we have a one digit number, we just return it, otherwise we recursively call `leftmost-digit` with the square of the base. Squaring the base will mean gobbling up pairs of digits in the recursive call, so we'll get back either a one or two digit answer from the recursion. If it is one digit, we return it, otherwise it's two digits and we divide by the base to get the left one.

For example, if our number is 12345678 and the base is 10, we'll make a recursive call with base 100. The recursive call will deal with number as if it were written `12 34 56 78` in base 100 and return the answer 12. Then we'll divide that by 10 to get the 1.

Since we're squaring the base, we're doubling the number of digits we're dealing with on each recursive call. This leads to the solution in O(log log n) time. If we instrument `quotient`, you can see:
```(leftmost-digit 10 816305093398751331727331379663195459013258742431006753294691576)
816305093398751331727331379663195459013258742431006753294691576 / 100000000000000000000000000000000
8163050933987513317273313796631 / 10000000000000000
816305093398751 / 100000000
8163050 / 10000
816 / 100```
A sixty-three digit number trimmed down to one digit with only five divisions.

So a simple solution to the puzzle is:
```(define (leftmost-digit+ base n)
(if (< n base)
(values n 0)
(call-with-values (lambda () (leftmost-digit+ (* base base) n))
(lambda (leftmost-pair count)
(if (< leftmost-pair base)
(values leftmost-pair (* count 2))
(values (quotient leftmost-pair base) (+ (* count 2) 1)))))))
```
The second value is the count of how many digits we discard. If the number is less than the base, we return it and we discarded nothing. Otherwise, we make the recursive call with the base squared and get back two values, the leftmost pair and the number of pairs it discarded. If the leftmost pair is a single digit, we return it, otherwise we divide by the base. The number of digits discarded is simply twice the number discarded by the recursive call, plus one more if we had to divide.

But I don't see an easy way to separate finding the digit from finding the position. At first it seemed straightforward to just count the digits being discarded, but you can't decide whether to increment the count at each stage without determining if the leftmost part of the recursive call contains one or two digits.

## Thursday, August 21, 2014

### Just a small puzzle

You can get the most significant digit (the leftmost) of a number pretty quickly this way
```(define (leftmost-digit base n)
(if (< n base)
n
(let ((leftmost-pair (leftmost-digit (* base base) n)))
(if (< leftmost-pair base)
leftmost-pair
(quotient leftmost-pair base)))))```
The puzzle is to adapt this code to return the position of the leftmost digit.

`(leftmost-digit+ 10 46729885)  would return two values, 4 and 7`

## Friday, August 8, 2014

### Mini regex golf 3: set cover

I'm computing the set cover by incrementally adding items to be covered. Naturally, the order in which you add items changes the way the program progresses. I added code that picks an item to be added each iteration rather than just pulling the `car` off the front of a list.
```(define (cover8 value->keys-table better-solution)

(let ((value (car v-k-entry))
(keys  (cdr v-k-entry)))

(write-string "Adding value ") (write value) (newline)
(write-string "   with keys ") (write keys) (newline)
(write-string "   to ") (write (length solution-set))
(write-string " partial solutions.") (newline)

(let ((new-solutions
(map make-new-solution (cartesian-product solution-set keys))))

(let ((trimmed-solutions
(trim-partial-solutions value->keys-table new-solutions)))

(write-string "Returning ") (write (length trimmed-solutions))
(write-string " of ") (write (length new-solutions))
(write-string " new partial solutions.") (newline)

trimmed-solutions))))

(define (cover v-k-entries)
(cond ((pair? v-k-entries)
(pick-v-k-entry value->keys-table v-k-entries
(lambda (selected remaining)
((null? v-k-entries)
(list '()))
(else (improper-list-error 'cover v-k-entries))))

(let ((minimized (minimize-vktable value->keys-table better-solution)))
(least-elements (cover minimized) better-solution)))

(define (score v-k-entry)
(let* ((matched-all
(count-matching-items value->keys-table
(lambda (other)
(there-exists? (cdr v-k-entry)
(lambda (key) (member key (cdr other)))))))
(matched-remaining
(count-matching-items v-k-entries
(lambda (other)
(there-exists? (cdr v-k-entry)
(lambda (key) (member key (cdr other)))))))
(matched-forward (- matched-all matched-remaining)))
(cons matched-remaining matched-forward)))

(let ((scored (map (lambda (v-k-entry) (cons (score v-k-entry) v-k-entry))
v-k-entries)))

(let ((picked
(cdar
(least-elements scored
(lambda (left right)
(let* ((len-l (length (cdr left)))
(len-r (length (cdr right)))
(lmr (caar left))
(lmf (cdar left))
(rmr (caar right))
(rmf (cdar right)))
(or (> len-l len-r)
(and (= len-l len-r)
(or (> lmf rmf)
(and (= lmf rmf)
(< lmr rmr)))))
))))))

(display "Picking ") (write picked) (newline)

(define (trim-partial-solutions value->keys-table partial-solutions)
(let ((equivalent-solutions
(map (lambda (entry) (cons (cdr entry) (car entry)))
(collect-equivalent-partial-solutions value->keys-table
partial-solutions))))
(write-string "  Deleting ")
(write (- (length partial-solutions) (length equivalent-solutions)))
(write-string " equivalent partial solutions.")
(newline)

(remove-dominated-solutions value->keys-table
(map lowest-scoring-equivalent-partial-solution
equivalent-solutions))))
```
Finally, it turns out that computing dominating partial solutions is expensive, so I changed the set operations to use a bitmap representation:
```(define (remove-dominated-solutions value->keys-table partial-solutions)
(let ((before-length (length partial-solutions))
(all-values (get-values value->keys-table)))
(let ((table
;; put the long ones in first
(sort
(map (lambda (partial-solution)
(cons partial-solution
(lset->bset all-values
(map car (partial-solution-matches value->keys-table
partial-solution)))))
partial-solutions)
(lambda (left right)
(> (length (bset->lset all-values (cdr left)))
(length (bset->lset all-values (cdr right))))))))

(dominates-solution? solution))
'()
table))))
(write-string "  Removing ") (write (- before-length after-length))
(write-string " dominated solutions.")
(newline)

(define (dominates-solution? solution)
(let* ((partial-solution (car solution))
(partial-solution-score (score partial-solution))
(solution-matches-raw (cdr solution)))
(lambda (other-solution)
(let* ((other-partial-solution (car other-solution))
(other-matches-raw (cdr other-solution)))
(and
(bset-superset? other-matches-raw solution-matches-raw)
(<= (score other-partial-solution) partial-solution-score))))))

(define (get-values v-k-table)
'()
v-k-table))

(define (bset-element->bit universe element)
(cond ((null? element) 0)
(else (expt 2 (list-index (lambda (item) (eq? item element)) universe)))))

(bset-union bset (bset-element->bit universe element)))

(define (lset->bset universe lset)
0
lset))

(define (bset->lset universe bset)
(cond ((zero? bset) '())
((even? bset) (bset->lset (cdr universe) (/ bset 2)))
(else (cons (car universe) (bset->lset (cdr universe) (/ (- bset 1) 2))))))

(define (bset-union left right) (bitwise-ior left right))

(define (bset-superset? bigger smaller)
;; Is every element of smaller in bigger?
(zero? (bitwise-andc2 smaller bigger)))
```
This code can now find the shortest regular expression consisting of letters and dots (and ^\$) that matches one set of strings but not another.

Depending on the strings, this can take quite a bit of time to run. Dotted expressions cause a combinatorical explosion in matching regexps (or substrings), but what makes it worse is that the dotted expressions tend to span different sets of strings. If two different dotted expressions, each with different matching sets of strings, appear in a single string, then the number of partial solutions will be multiplied by two as we try each different dotted expression.

It is characteristic of NP problems that it is easy to determine if you have a good solution, but quite hard to find it among a huge number of other, poor solutions. This problem exhibits this characteristic, but there is a bit more structure in the problem that we are exploiting. The word lists are drawn from the English language. This makes some bigrams, trigrams, etc. far, far, more likely to appear than others.

Short words are much easier to process than longer ones because they simply contain fewer things to match. On the other hand, longer words tend to be dominated by shorter ones anyway.

To be continued...

## Thursday, August 7, 2014

### Mini regex golf 2: adding regular expressions

It wasn't too hard to add regular expressions to the substring version. What took a while was just tinkering with the code, breaking it, fixing it again, noticing an optimization, tinkering, etc. etc. In any case it works and here is some of it.
```(define (make-extended-ngram-table winners losers)
(let* ((initial-ngrams (generate-ngrams winners losers)))
(write-string "Initial ngrams: ") (write (length initial-ngrams))
(newline)
(map (lambda (winner)
(cons winner
(keep-matching-items initial-ngrams
(lambda (ngram) (re-string-search-forward ngram winner)))))
winners)))

(define (generate-ngrams winners losers)
(write-string "Generating ngrams...")(newline)
(let ((losing-ngram? (string-list-matcher losers)))
(lset-union equal? answer (extended-ngrams losing-ngram? winner)))
'()
winners)))

(define (string-list-matcher string-list)
(lambda (test-ngram)
(there-exists? string-list
(lambda (string)
(re-string-search-forward test-ngram string)))))

(define *dotification-limit* 4)
(define *generate-ends-of-words* #t)
(define *generate-dotted* #t)

(define (ngrams-of-length n string)
(do ((start    0 (1+ start))
(end      n (1+ end))

(answer '() (let ((item (car tail)))
(if (losing-ngram? dotted)
(dotify item)))))
((not (pair? tail))
(if (null? tail)
(improper-list-error 'generate-dotted tail)))))

(define (dotify word)
(cond ((string=? word "") (list ""))
((> (string-length word) *dotification-limit*) (list word))
(else
(string-append replacement dotified)))
(replacements (substring word 0 1))))
'()
(dotify (substring word 1 (string-length word)))))))

(define (replacements string)
(if (or (string=? string "^")
(string=? string "\$"))
(list string)
(list string ".")))

(define (extended-ngrams losing-ngram? string)
(let ((string (if *generate-ends-of-words*
(string-append "^" string "\$")
string)))
(do ((n 1    (+ n 1))
(delete-matching-items (ngrams-of-length n string)
losing-ngram?))))
((> n (string-length string))
(if *generate-dotted*
Adding the dotification greatly increases the number of ways to match words:
```1 ]=> (extended-ngrams (string-list-matcher losers) "lincoln")

;Value 15: ("li" "ln" "ln\$" "oln" ".ln" "col" "lin" "li." "^li" "o.n\$" "oln\$" ".ln\$" "col." "c.ln" "..ln" "coln" ".oln" "co.n" "n.ol" "..ol" "ncol" ".col" "nc.l" "i.co" "inco" "i..o" "in.o" "lin." "li.." "l.nc" "linc" "l..c" "li.c" "^li." "^lin" "coln\$" "ncoln" "incol" "linco" "^linc" "ncoln\$" "incoln" "lincol" "^linco" "incoln\$" "lincoln" "^lincol" "lincoln\$" "^lincoln" "^lincoln\$")```
The table that maps words to their extended ngrams is quite large, but it can be reduced in size without affecting the solution to the set cover problem. If two regexps match exactly the same set of winning strings, then one can be substituted for the other in any solution, so we can discard all but the shortest of these. If a regexp matches a proper superset of another regexp, and the other regexp is at least the same length or longer, then the first regexp dominates the second one, so we can discard the second one.
```(define (minimize-keys value->keys-table better-solution)
(let* ((all-keys (get-keys value->keys-table))
(equivalents (collect-equivalent-partial-solutions value->keys-table
(map list all-keys)))
(reduced (map (lambda (equivalent)
(cons (car equivalent)
(car (least-elements (cdr equivalent)
better-solution))))
equivalents))
(dominants (collect-dominant-partial-solutions reduced better-solution))
'()
dominants)))

(define (rebuild-entry entry)
(cons (car entry) (keep-matching-items (cdr entry)
(lambda (item) (member item good-keys)))))

(write-string "Deleting ") (write (- (length all-keys) (length good-keys)))
(write-string " of ") (write (length all-keys)) (write-string " keys.  ")
(write (length good-keys)) (write-string " keys remain.")(newline)
(map rebuild-entry value->keys-table)))

(define (partial-solution-matches value->keys-table partial-solution)
(keep-matching-items
value->keys-table
(lambda (entry)
(there-exists? partial-solution (lambda (key) (member key (cdr entry)))))))

(define (collect-equivalent-partial-solutions value->keys-table partial-solutions)

(for-each (lambda (partial-solution)
(map car (partial-solution-matches
value->keys-table
partial-solution))
(list)
(lambda (other)
partial-solutions)

(define (collect-dominant-partial-solutions equivalents better-solution)
(define (dominates? left right)
(and (superset? (car left) (car right))
(not (better-solution (cdr right) (cdr left)))))

(let ((sorted (sort equivalents
(lambda (l r) (> (length (car l)) (length (car r)))))))
(if (there-exists? answer (lambda (a) (dominates? a candidate)))
'()
sorted)))
```
We can minimize the value->key-table in another way. If two values in the table are matched by the exact same set of keys, then we can delete one without changing the solution. If a value is matched by a small set of keys, and if another values is matched by a superset of these keys, then we can delete the larger one because if the smaller one matches, the larger one must match as well.
```(define (minimize-values v-k-table)
(let ((size-before (length v-k-table)))

(define (dominated-value? entry)
(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(there-exists? v-k-table
(lambda (other-entry)
(and (not (eq? entry other-entry))
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(let ((result (and (superset? entry-keylist other-keylist)
(not (superset? other-keylist entry-keylist)))))
(if result
(begin (display "Removing ")
(write entry-value)
(display " dominated by ")
(write other-value)
(display ".")
(newline)
))
result)))))))

(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(lambda (other-entry)
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(let ((result (equal? entry-keylist other-keylist)))
(if result
(begin (display "Removing ")
(write entry-value)
(display " equivalent to ")
(write other-value)
(display ".")
(newline)
))
result))))))

(dominated-value? entry))

(write-string "Removed ") (write (- size-before (length answer)))
(write-string " dominated and equivalent values.")
(newline)
Each time we remove values or keys, we might make more keys and values equivalent or dominated, so we iterate until we can no longer remove anything.
```(define (minimize-vktable value->keys-table better-solution)
(let* ((before-size (fold-left + 0 (map length value->keys-table)))
(new-table
(minimize-values
(minimize-keys value->keys-table better-solution)))
(after-size (fold-left + 0 (map length new-table))))
(if (= before-size after-size)
value->keys-table
(minimize-vktable new-table better-solution))))```
The minimized table for the presidents looks like this:
```(("washington" "sh" "g..n" "n..o" ".h.n" "a..i")
("monroe" "r.e\$" "oe")
("van-buren" "u..n" "r.n" ".b" "bu" "-")
("harrison" "r..s" "r.i" "i..n" "is." "i.o" "a..i")
("polk" "po")
("taylor" "ay." "ta")
("pierce" "ie." "rc" "r.e\$")
("buchanan" "bu" "a.a" ".h.n")
("lincoln" "i..o" "li")
("grant" "an.\$" "a.t" "ra" "r.n" "g..n")
("hayes" "h..e" "ye" "ay.")
("garfield" "el.\$" "i.l" "ga" "ie." "r.i" ".f" "a..i")
("cleveland" "v.l" "an.\$")
("mckinley" "n.e" "nl" "i.l" "m..i")
("roosevelt" ".se" "oo" "v.l" "el.\$" "r..s")
("taft" "a.t" "ta" ".f")
("wilson" "ls" "i..o")
("harding" "r.i" "di" "a..i")
("coolidge" "oo" "li")
("hoover" "ho" "oo")
("truman" "u..n" "ma")
("eisenhower" "ho" ".se" "h..e" "i..n" "is.")
("kennedy" "nn" "n.e")
("johnson" "j")
("nixon" "^n" "i..n" "i.o" "n..o")
("carter" "rt" "a.t")
("reagan" "ga" "a.a")
("bush" "bu" "sh")
("obama" ".b" "ma" "a.a" "am"))```
As you can see, we have reduced the original 2091 matching regexps to fifty.

Changes to the set-cover code coming soon....

## Friday, August 1, 2014

### Mini regex golf

I was intrigued by Peter Norvig's articles about regex golf.

To make things easier to think about, I decided to start with the simpler problem of looking for substrings. Here's code to extract the ngrams of a string:
```(define (ngrams-of-length n string)
(do ((start    0 (1+ start))
(end      n (1+ end))

(define (ngrams string)
(do ((n 1 (+ n 1))
A solution is simply a list of ngrams. (Although not every list of ngrams is a solution!)
```(define (solution? solution winners losers)
(let ((matches-solution? (ngram-list-matcher solution)))
(and (for-all? winners matches-solution?)
(not (there-exists? losers matches-solution?)))))

(define (ngram-list-matcher ngram-list)
(lambda (test-string)
(there-exists? ngram-list
(lambda (ngram)
(string-search-forward ngram test-string)))))
```
We also want to know if an ngram appears in a given list of strings.
```(define (string-list-matcher string-list)
(lambda (test-ngram)
(there-exists? string-list
(lambda (string)
(string-search-forward test-ngram string)))))

(let ((matches-loser? (string-list-matcher losers)))
(for-each
(lambda (winner) (write-string winner) (write-string ": ")
(write (reverse (delete-matching-items (ngrams winner) matches-loser?)))
(newline))
winners)))

washington: ("sh" "hi" "gt" "to" "was" "ash" "shi" "hin" "ngt" "gto" ...)
jefferson: ("j" "je" "ef" "ff" "fe" "rs" "jef" "eff" "ffe" "fer" ...)
monroe: ("oe" "onr" "nro" "roe" "monr" "onro" "nroe" "monro" "onroe" "monroe")
jackson: ("j" "ja" "ac" "ks" "jac" "ack" "cks" "kso" "jack" "acks" ...)
van-buren: ("-" "va" "n-" "-b" "bu" "van" "an-" "n-b" "-bu" "bur" ...)
harrison: ("har" "arr" "rri" "ris" "iso" "harr" "arri" "rris" "riso" "ison" ...)
polk: ("po" "pol" "olk" "polk")
taylor: ("ta" "yl" "lo" "tay" "ayl" "ylo" "lor" "tayl" "aylo" "ylor" ...)
pierce: ("rc" "ce" "pie" "ier" "erc" "rce" "pier" "ierc" "erce" "pierc" ...)
buchanan: ("bu" "uc" "ch" "na" "buc" "uch" "cha" "ana" "nan" "buch" ...)
lincoln: ("li" "ln" "lin" "col" "oln" "linc" "inco" "ncol" "coln" "linco" ...)
grant: ("ra" "gra" "ran" "ant" "gran" "rant" "grant")
hayes: ("ye" "hay" "aye" "yes" "haye" "ayes" "hayes")
garfield: ("ga" "rf" "fi" "gar" "arf" "rfi" "fie" "iel" "eld" "garf" ...)
cleveland: ("lev" "vel" "ela" "clev" "leve" "evel" "vela" "elan" "cleve" "level" ...)
mckinley: ("nl" "mck" "inl" "nle" "mcki" "kinl" "inle" "nley" "mckin" "ckinl" ...)
roosevelt: ("oo" "os" "lt" "roo" "oos" "ose" "sev" "vel" "elt" "roos" ...)
taft: ("ta" "af" "ft" "taf" "aft" "taft")
wilson: ("ls" "ils" "lso" "wils" "ilso" "lson" "wilso" "ilson" "wilson")
harding: ("di" "har" "ard" "rdi" "din" "hard" "ardi" "rdin" "ding" "hardi" ...)
coolidge: ("oo" "li" "coo" "ool" "oli" "lid" "cool" "ooli" "olid" "lidg" ...)
hoover: ("ho" "oo" "hoo" "oov" "hoov" "oove" "hoove" "oover" "hoover")
truman: ("tr" "ru" "ma" "tru" "rum" "uma" "man" "trum" "ruma" "uman" ...)
eisenhower: ("ei" "nh" "ho" "ow" "eis" "ise" "sen" "enh" "nho" "how" ...)
kennedy: ("nn" "ed" "dy" "ken" "enn" "nne" "ned" "edy" "kenn" "enne" ...)
johnson: ("j" "jo" "oh" "hn" "joh" "ohn" "hns" "john" "ohns" "hnso" ...)
nixon: ("ni" "ix" "xo" "nix" "ixo" "xon" "nixo" "ixon" "nixon")
carter: ("rt" "car" "art" "rte" "cart" "arte" "rter" "carte" "arter" "carter")
reagan: ("ea" "ag" "ga" "rea" "eag" "aga" "gan" "reag" "eaga" "agan" ...)
bush: ("bu" "us" "sh" "bus" "ush" "bush")
clinton: ("li" "to" "cli" "lin" "int" "nto" "ton" "clin" "lint" "into" ...)
obama: ("ob" "ba" "am" "ma" "oba" "bam" "ama" "obam" "bama" "obama")
```
We can discard ngrams like "shi" because the shorter ngram "sh" will also match.
```(define (dominant-ngrams string losing-ngram?)
(do ((n 1 (+ n 1))
(delete-matching-items
(ngrams-of-length n string)
(lambda (item)
(lambda (ngram)
(string-search-forward ngram item)))
(losing-ngram? item))))

(let ((matches-loser? (string-list-matcher losers)))
(for-each
(lambda (winner) (write-string winner) (write-string ": ")
(write (dominant-ngrams winner matches-loser?))
(newline))
winners)))

washington: ("was" "to" "gt" "hi" "sh")
jefferson: ("rs" "fe" "ff" "ef" "j")
monroe: ("nro" "onr" "oe")
jackson: ("ks" "ac" "j")
van-buren: ("ren" "ure" "bu" "va" "-")
harrison: ("iso" "ris" "rri" "arr" "har")
polk: ("olk" "po")
taylor: ("lo" "yl" "ta")
pierce: ("ier" "pie" "ce" "rc")
buchanan: ("na" "ch" "uc" "bu")
lincoln: ("inco" "col" "ln" "li")
grant: ("ant" "ra")
hayes: ("hay" "ye")
garfield: ("eld" "iel" "fi" "rf" "ga")
cleveland: ("ela" "vel" "lev")
mckinley: ("mck" "nl")
roosevelt: ("vel" "sev" "lt" "os" "oo")
taft: ("ft" "af" "ta")
wilson: ("ls")
harding: ("ard" "har" "di")
coolidge: ("li" "oo")
hoover: ("oo" "ho")
truman: ("ma" "ru" "tr")
eisenhower: ("wer" "sen" "ise" "ow" "ho" "nh" "ei")
kennedy: ("ken" "dy" "ed" "nn")
johnson: ("hn" "oh" "j")
nixon: ("xo" "ix" "ni")
carter: ("car" "rt")
reagan: ("ga" "ag" "ea")
bush: ("sh" "us" "bu")
clinton: ("int" "to" "li")
obama: ("ma" "am" "ba" "ob")
```
It's time to tackle the set cover problem. We want a set of ngrams that match all the strings. Obviously, if we pick an ngram from each of the strings we want to cover, we'll have a solution. For instance,
```(let ((matches-loser? (string-list-matcher losers)))
(solution? (delete-duplicates
(map
(lambda (winner) (car (dominant-ngrams winner matches-loser?)))
winners))
winners losers))
;Value: #t
```
We can cycle through all the possible solutions and then select the best one.
```(define (mini-golf0 winners losers)
(lowest-scoring
(cover0 (make-dominant-ngram-table
winners
(delete-losing-superstrings winners losers)))))

(define (delete-losing-superstrings winners losers)
(delete-matching-items
losers
(lambda (loser)
(there-exists? winners
(lambda (winner)
(string-search-forward winner loser))))))

(define (make-dominant-ngram-table winners losers)
(let ((losing-ngram? (string-list-matcher losers)))
(map (lambda (winner)
(cons winner (dominant-ngrams winner losing-ngram?)))
winners)))

(define (cover0 v-k-table)
(let ((empty-solution-set (list '())))

(let ((value (car v-k-entry))
(keys  (cdr v-k-entry)))

(write-string "Adding value ") (write value) (newline)
(write-string "   with keys ") (write keys) (newline)
(write-string "   to ") (write (length solution-set))
(write-string " partial solutions.") (newline)

(let ((new-solutions
(map make-new-solution (cartesian-product solution-set keys))))

(write-string "Returning ") (write (length new-solutions))
(write-string " new partial solutions.") (newline)

new-solutions)))

(define (lowest-scoring list)
(least-elements list (lambda (l r) (< (score l) (score r)))))

(define (cartesian-product left-list right-list)
right-list))
'()
left-list))

(define (make-new-solution cp-term)
(let ((solution (car cp-term))
(key (cdr cp-term)))

(define (improper-list-error procedure thing)
(error (string-append "Improper list found by " procedure ": ") thing))

(define (least-elements list <)
((< item (car answer)) (cons item '()))

(cond ((pair? list) (fold-left accumulate-least
(cons (car list) '())
(cdr list)))
((null? list) (error "List must have at least one element." list))
(else (improper-list-error 'LEAST-ELEMENTS list))))

(define (score solution)
(do ((tail solution (cdr tail))
(score -1      (+ score (string-length (car tail)) 1)))
((not (pair? tail))
(if (null? tail)
score
(improper-list-error 'score solution)))))
```
This works for small sets:
```1 ]=> (mini-golf0 boys girls)
with keys ("ob" "c" "j")
to 1 partial solutions.
Returning 3 new partial solutions.
with keys ("as")
to 3 partial solutions.
Returning 3 new partial solutions.
with keys ("an" "ha")
to 3 partial solutions.
Returning 6 new partial solutions.
with keys ("ah" "oa" "no")
to 6 partial solutions.
Returning 18 new partial solutions.
with keys ("lia" "lli" "ill" "am" "w")
to 18 partial solutions.
Returning 90 new partial solutions.
with keys ("lia" "am")
to 90 partial solutions.
Returning 180 new partial solutions.
with keys ("en" "de" "yd" "ay" "j")
to 180 partial solutions.
Returning 900 new partial solutions.
with keys ("ae" "ha" "c")
to 900 partial solutions.
Returning 2700 new partial solutions.
with keys ("de" "nd" "an" "le" "al" "r" "x")
to 2700 partial solutions.
Returning 18900 new partial solutions.
with keys ("en" "de" "id")
to 18900 partial solutions.
Returning 56700 new partial solutions.
;Value 41: (("de" "am" "ah" "ha" "as" "j")
("de" "am" "ah" "ha" "as" "j")
("de" "am" "oa" "ha" "as" "j")
("de" "am" "oa" "ha" "as" "j")
("de" "am" "no" "ha" "as" "j")
("de" "am" "no" "ha" "as" "j")
("de" "am" "ah" "ha" "as" "c")
("de" "am" "ah" "ha" "as" "c")
("de" "am" "oa" "ha" "as" "c")
("de" "am" "oa" "ha" "as" "c")
("de" "am" "no" "ha" "as" "c")
("de" "am" "no" "ha" "as" "c")
("de" "am" "ah" "an" "as" "c")
("de" "am" "ah" "an" "as" "c")
("en" "am" "ah" "an" "as" "c")
("de" "am" "oa" "an" "as" "c")
("de" "am" "oa" "an" "as" "c")
("en" "am" "oa" "an" "as" "c")
("de" "am" "no" "an" "as" "c")
("de" "am" "no" "an" "as" "c")
("en" "am" "no" "an" "as" "c"))
```
But you can see that we won't be able to go much beyond this because there are just too many combinations. We can cut down on the intermediate partial solutions by noticing that many of them are redundant. We don't need to keep partial solutions that cannot possibly lead to a shortest final solution. The various partial solutions each (potentially) match different sets of words. We only need keep the shortest solution for each different set of matched words. Furthermore, if a solution's matches are a superset of another's matches, and the other is the same length or longer, then the solution is dominated by the other and will always be at least the length of the longer.
```(define (mini-golf1 winners losers)
(cover1
(make-dominant-ngram-table winners (delete-losing-superstrings winners losers))
lowest-scoring))

(define (cover1 v-k-table lowest-scoring)
(let ((empty-solution-set (list '())))

(let ((value (car v-k-entry))
(keys  (cdr v-k-entry)))

(write-string "Adding value ") (write value) (newline)
(write-string "   with keys ") (write keys) (newline)
(write-string "   to ") (write (length solution-set))
(write-string " partial solutions.") (newline)

(let ((new-solutions
(map make-new-solution (cartesian-product solution-set keys))))

(let ((trimmed-solutions (trim-partial-solutions new-solutions)))

(write-string "Returning ") (write (length trimmed-solutions))
(write-string " of ") (write (length new-solutions))
(write-string " new partial solutions.") (newline)

trimmed-solutions))))

(define (trim-partial-solutions partial-solutions)
(let ((equivalent-solutions (collect-equivalent-partial-solutions partial-solutions)))
(write-string "  Deleting ")
(write (- (length partial-solutions) (length equivalent-solutions)))
(write-string " equivalent partial solutions.")
(newline)

(remove-dominated-solutions
(map lowest-scoring-equivalent-partial-solution equivalent-solutions))))

(define (lowest-scoring-equivalent-partial-solution entry)
(first (lowest-scoring (car entry))))

(define (collect-equivalent-partial-solutions alist)
;; Add each entry in turn.
(fold-left (lambda (equivalents partial-solution)
partial-solution
(partial-solution-matches partial-solution)
equivalents))
'() alist))

(define (partial-solution-matches partial-solution)
(keep-matching-items v-k-table
(lambda (entry)
(there-exists? partial-solution
(lambda (key) (member key (cdr entry)))))))

(define (remove-dominated-solutions partial-solutions)
(let ((before-length (length partial-solutions)))
'()
(map (lambda (partial-solution)
(cons partial-solution (partial-solution-matches partial-solution)))
partial-solutions)))))
(write-string "  Deleting ") (write (- before-length after-length))
(write-string " dominated solutions.")
(newline)

(lowest-scoring

(define (dominates-solution? solution)
(let ((partial-solution (car solution))
(solution-matches (cdr solution)))
(lambda (other-solution)
(let ((other-partial-solution (car other-solution))
(other-matches (cdr other-solution)))
(and (not (equal? solution-matches other-matches))
(superset? other-matches solution-matches)
(<= (score other-partial-solution) (score partial-solution)))))))

(cond ((pair? alist)
(let ((entry (car alist))
(tail (cdr alist)))
(let ((entry-solutions (car entry))
(entry-value (cdr entry)))
(if (equal? value entry-value)
(if (member solution entry-solutions)
alist
(cons (cons (cons solution entry-solutions) value)
tail))
(cons entry (add-equivalent-partial-solution solution value tail))))))
((null? alist) (list (cons (list solution) value)))
(else (improper-list-error 'collect-equivalents alist))))
```
```1 ]=> (mini-golf1 winners losers)
with keys ("was" "to" "gt" "hi" "sh")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 1 dominated solutions.
Returning 2 of 5 new partial solutions.
to 2 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 4 of 6 new partial solutions.
with keys ("rs" "fe" "ff" "ef" "j")
to 4 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 4 dominated solutions.
Returning 4 of 20 new partial solutions.
with keys ("iso" "di" "ad" "ma")
to 4 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 12 of 16 new partial solutions.
with keys ("nro" "onr" "oe")
to 12 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 36 new partial solutions.
with keys ("ks" "ac" "j")
to 12 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 36 new partial solutions.
with keys ("ren" "ure" "bu" "va" "-")
to 12 partial solutions.
Deleting 36 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 24 of 60 new partial solutions.
with keys ("iso" "ris" "rri" "arr" "har")
to 24 partial solutions.
Deleting 96 equivalent partial solutions.
Removing 12 dominated solutions.
Returning 12 of 120 new partial solutions.
with keys ("olk" "po")
to 12 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 24 new partial solutions.
with keys ("lo" "yl" "ta")
to 12 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 12 dominated solutions.
Returning 12 of 36 new partial solutions.
with keys ("ier" "pie" "ce" "rc")
to 12 partial solutions.
Deleting 36 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 48 new partial solutions.
with keys ("na" "ch" "uc" "bu")
to 12 partial solutions.
Deleting 39 equivalent partial solutions.
Removing 3 dominated solutions.
Returning 6 of 48 new partial solutions.
with keys ("inco" "col" "ln" "li")
to 6 partial solutions.
Deleting 15 equivalent partial solutions.
Removing 6 dominated solutions.
Returning 3 of 24 new partial solutions.
with keys ("ant" "ra")
to 3 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 3 of 6 new partial solutions.
with keys ("hay" "ye")
to 3 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 3 of 6 new partial solutions.
with keys ("eld" "iel" "fi" "rf" "ga")
to 3 partial solutions.
Deleting 9 equivalent partial solutions.
Removing 3 dominated solutions.
Returning 3 of 15 new partial solutions.
with keys ("ela" "vel" "lev")
to 3 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 9 new partial solutions.
with keys ("mck" "nl")
to 6 partial solutions.
Deleting 6 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 12 new partial solutions.
with keys ("vel" "sev" "lt" "os" "oo")
to 6 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 30 new partial solutions.
with keys ("ft" "af" "ta")
to 6 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 18 new partial solutions.
with keys ("ls")
to 6 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 6 new partial solutions.
with keys ("ard" "har" "di")
to 6 partial solutions.
Deleting 12 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 4 of 18 new partial solutions.
with keys ("li" "oo")
to 4 partial solutions.
Deleting 4 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 4 of 8 new partial solutions.
with keys ("oo" "ho")
to 4 partial solutions.
Deleting 4 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 2 of 8 new partial solutions.
with keys ("ma" "ru" "tr")
to 2 partial solutions.
Deleting 4 equivalent partial solutions.
Removing 1 dominated solutions.
Returning 1 of 6 new partial solutions.
with keys ("wer" "sen" "ise" "ow" "ho" "nh" "ei")
to 1 partial solutions.
Deleting 6 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 7 new partial solutions.
with keys ("ken" "dy" "ed" "nn")
to 1 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 4 new partial solutions.
with keys ("hn" "oh" "j")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
with keys ("xo" "ix" "ni")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
with keys ("car" "rt")
to 1 partial solutions.
Deleting 1 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 2 new partial solutions.
with keys ("ga" "ag" "ea")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
with keys ("sh" "us" "bu")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
with keys ("int" "to" "li")
to 1 partial solutions.
Deleting 2 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 3 new partial solutions.
with keys ("ma" "am" "ba" "ob")
to 1 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 4 new partial solutions.
;Value 47: (("rt" "ni" "nn" "ho" "ls" "nl" "vel" "ga" "ye" "ra" "li" "rc" "ta" "po" "har" "bu" "oe" "ma" "j" "ad" "sh"))```
The `cover` procedure takes a table that maps values to the keys that cover them. If we can reduce the size of that table without changing the solution, we'll run faster. If there are two entries in the table such that the keys of one are a superset of the keys of the other, we can discard the superset: the smaller of the two entries will be in the solution, and any key that matches the smaller one will automatically match the larger one as well. Also, if two values have the same set of keys that match them, we need only include one of the values in the table.
```(define (delete-dominated-values v-k-table)
(let ((size-before (length v-k-table)))

(define (dominated-value? entry)
(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(there-exists? v-k-table
(lambda (other-entry)
(and (not (eq? entry other-entry))
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(and (superset? entry-keylist other-keylist)
(not (equal? other-keylist entry-keylist)))))))))

(let ((entry-value (car entry))
(entry-keylist (cdr entry)))
(lambda (other-entry)
(let ((other-value (car other-entry))
(other-keylist (cdr other-entry)))
(equal? entry-keylist other-keylist))))))

(dominated-value? entry))

(write-string "Removed ") (write (- size-before (length answer)))
(write-string " dominated and equivalent values.")
(newline)

(define (superset? bigger smaller)
(for-all? smaller (lambda (s) (member s bigger))))

(define (mini-golf2 winners losers)
(cover1
(delete-dominated-values
(make-dominant-ngram-table winners (delete-losing-superstrings winners losers)))
lowest-scoring))

;;;;;;;;
;; Delete dominated keys from the keylists.

(define (mini-golf3 winners losers)
(cover1
(delete-dominated-keys-and-values
(make-dominant-ngram-table winners (delete-losing-superstrings winners losers))
(lambda (left right)
(or (< (string-length left) (string-length right))
(and (= (string-length left) (string-length right))
(string<? left right)))))
lowest-scoring))

(define (delete-dominated-keys-and-values v-k-table better-key)
(let ((before-size (fold-left * 1 (map length v-k-table))))
(let ((new-table (delete-dominated-values
(delete-dominated-keys v-k-table better-key))))
(let ((after-size (fold-left * 1 (map length new-table))))
(if (= before-size after-size)
v-k-table
(delete-dominated-keys-and-values new-table better-key))))))

(define (delete-dominated-keys v-k-table better-key)
(let ((all-keys (get-all-keys v-k-table)))

(define (lookup-key key)
(cons key
(map car
(keep-matching-items v-k-table
(lambda (v-k-entry)
(member key (cdr v-k-entry)))))))

(let ((k-v-table (map lookup-key all-keys)))

(define (dominated-key? key)
(let ((values (cdr (assoc key k-v-table))))
(there-exists? k-v-table
(lambda (entry)
(let ((entry-key (car entry))
(entry-values (cdr entry)))
(and (superset? entry-values values)
(not (equal? values entry-values))
(or (< (string-length entry-key) (string-length key))
(and (= (string-length entry-key) (string-length key))
(string<? entry-key key)))))))))

(let ((values (cdr (assoc key k-v-table))))
(lambda (entry-key)
(let ((entry-values (cdr (lookup-key entry-key))))
(equal? values entry-values))))))

(if (or (dominated-key? key)

(let ((good-keys (fold-left add-keys '() (sort all-keys better-key))))
(write-string "Removed ") (write (- (length all-keys) (length good-keys)))
(write-string " of ") (write (length all-keys)) (write-string " keys.")(newline)

(map (lambda (entry)
(cons (car entry)
(keep-matching-items (cdr entry) (lambda (key) (member key good-keys)))))
v-k-table)))))

(define (get-all-keys v-k-table)
(cdr entry)))
'()
v-k-table))```
Trimming the table this way helps a lot. We can now compute the dogs vs. cats.
```1 ]=> (mini-golf3 dogs cats)

Removed 294 of 405 keys.
Removed 44 dominated and equivalent values.
Removed 25 of 93 keys.
Removed 15 dominated and equivalent values.
Removed 7 of 62 keys.
Removed 0 dominated and equivalent values.
Removed 0 of 55 keys.
Removed 0 dominated and equivalent values.
with keys ("OIS" "BOR" "RZ")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 3 of 3 new partial solutions.
with keys ("SCH" "HN")
to 3 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 6 new partial solutions.
with keys ("JI")
to 6 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 6 of 6 new partial solutions.
with keys ("TERS" "ETT")
to 6 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 12 new partial solutions.
with keys ("CHI")
to 12 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 12 of 12 new partial solutions.
with keys ("S F" "DES" " DE" "IER" "FL" "VI")
to 12 partial solutions.
Deleting 8 equivalent partial solutions.
Removing 8 dominated solutions.
Returning 56 of 72 new partial solutions.
with keys ("EKI")
to 56 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 56 of 56 new partial solutions.
with keys (" MAL" "OIS" "LG")
to 56 partial solutions.
Deleting 96 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 72 of 168 new partial solutions.
with keys ("TERS" "D P")
to 72 partial solutions.
Deleting 108 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 144 new partial solutions.
with keys ("W ")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
with keys ("DS")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
with keys ("BOR" " DE" "GU")
to 36 partial solutions.
Deleting 88 equivalent partial solutions.
Removing 2 dominated solutions.
Returning 18 of 108 new partial solutions.
with keys ("ANS" "LM")
to 18 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
with keys ("LH")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 36 of 36 new partial solutions.
with keys (" COR" "ORS")
to 36 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 72 of 72 new partial solutions.
with keys (" MAL" "TES" "LAS" "KA")
to 72 partial solutions.
Deleting 184 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 104 of 288 new partial solutions.
with keys ("IP")
to 104 partial solutions.
Deleting 0 equivalent partial solutions.
;GC #199: took:   0.20   (1%) CPU time,   0.10   (1%) real time; free: 16754359
Removing 0 dominated solutions.
Returning 104 of 104 new partial solutions.
with keys ("SHI" " I")
to 104 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 208 of 208 new partial solutions.
with keys ("AK")
to 208 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 208 of 208 new partial solutions.
with keys ("DES" "DG" "OD")
to 208 partial solutions.
Deleting 304 equivalent partial solutions.
Removing 144 dominated solutions.
Returning 176 of 624 new partial solutions.
with keys ("S F" "FR")
to 176 partial solutions.
Deleting 224 equivalent partial solutions.
Removing 16 dominated solutions.
Returning 112 of 352 new partial solutions.
with keys ("API")
to 112 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 112 of 112 new partial solutions.
with keys ("IES")
to 112 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 112 of 112 new partial solutions.
with keys ("LAS" "IZ" "VI")
to 112 partial solutions.
;GC #200: took:   0.10   (0%) CPU time,   0.10   (1%) real time; free: 16757322
Deleting 272 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 336 new partial solutions.
with keys ("ITT")
to 64 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 64 new partial solutions.
with keys ("GS")
to 64 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 64 new partial solutions.
with keys ("HAVANE")
to 64 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 64 of 64 new partial solutions.
with keys ("ANI" "LS")
to 64 partial solutions.
Deleting 80 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 48 of 128 new partial solutions.
with keys ("FS")
to 48 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 48 of 48 new partial solutions.
with keys ("TES" "LT")
to 48 partial solutions.
Deleting 72 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 24 of 96 new partial solutions.
with keys (" COR" "LS")
to 24 partial solutions.
Deleting 32 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 16 of 48 new partial solutions.
with keys ("IER" " T")
to 16 partial solutions.
Deleting 24 equivalent partial solutions.
Removing 4 dominated solutions.
Returning 4 of 32 new partial solutions.
with keys ("ANS" "ANI")
to 4 partial solutions.
Deleting 6 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 2 of 8 new partial solutions.
with keys ("GR")
to 2 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 2 of 2 new partial solutions.
with keys ("SCH" " PI")
to 2 partial solutions.
Deleting 3 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 4 new partial solutions.
with keys ("SHI" " T")
to 1 partial solutions.
Deleting 1 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 2 new partial solutions.
with keys ("EI")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 1 new partial solutions.
with keys ("DL" "OD")
to 1 partial solutions.
Deleting 1 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 2 new partial solutions.
with keys ("OX")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 1 new partial solutions.
with keys ("AGL")
to 1 partial solutions.
Deleting 0 equivalent partial solutions.
Removing 0 dominated solutions.
Returning 1 of 1 new partial solutions.