## Friday, April 11, 2008

### Learning by programming

One of the best ways to learn something is to attempt to teach it to someone else. If you want to thoroughly learn the details, try teaching a computer.

My linear algebra is very rusty and it wasn't much good to begin with, so rather than snarf some code that did spectral clustering, I've decided to teach my computer how to do it. My experience has taught me that whenever there is heavy math involved in something, I always screw it up in some subtle way. I either misplace a sign, forget a term, or transpose something and the answer comes out wrong. Then I spend hours trying to understand what the answer means only to discover that it wasn't even computed correctly in the first place. Based on this experience, I take the approach of “First make it correct, then make it perform.”

So let's teach my computer how to do spectral clustering. First, we need a representation of a matrix. A vector of rows is not optimal, but it is simple.
```;;; A dotimes macro, very convenient for matrix math.
(define-syntax dotimes
(syntax-rules ()
((dotimes (variable limit-expr return-value ...) body0 body ...)
(let ((limit limit-expr))
(do ((variable 0 (+ variable 1)))
((>= variable limit) return-value ...)
body0
body ...)))))

(define (allocate-matrix height width)
(let ((rows (make-vector height #f)))
(dotimes (i height rows)
(vector-set! rows i (make-vector width 0)))))

(define (matrix-height matrix)
(vector-length matrix))

(define (matrix-width matrix)
(vector-length (vector-ref matrix 0)))

(define (matrix-ref matrix i j)
(vector-ref (vector-ref matrix i) j))

(define (matrix-set! matrix i j value)
(vector-set! (vector-ref matrix i) j value))
```
.
This gets us started. (By the way, hygienic macros are fantastic for things like `dotimes'. Accidental variable capture is not a problem.) We need to multiply these matrixes. I've screwed this up so many times (row vs. column, left vs. right) that I wrote a test program to be sure that I got it right. Getting it right once is easy, keeping it right as we optimize is hard.
```(define (test-multiply matrix-multiply)
(let ((matrix-a (allocate-matrix 2 3))
(matrix-b (allocate-matrix 3 2)))
;; a, row 0 (1 0 2)
(matrix-set! matrix-a 0 0 1)
(matrix-set! matrix-a 0 1 0)
(matrix-set! matrix-a 0 2 2)
;; a, row 1 (-1 3 1)
(matrix-set! matrix-a 1 0 -1)
(matrix-set! matrix-a 1 1 3)
(matrix-set! matrix-a 1 2 1)
;; b, row 0 (3 1)
(matrix-set! matrix-b 0 0 3)
(matrix-set! matrix-b 0 1 1)
;; b, row 1 (2 1)
(matrix-set! matrix-b 1 0 2)
(matrix-set! matrix-b 1 1 1)
;; b, row 2 (1 0)
(matrix-set! matrix-b 2 0 1)
(matrix-set! matrix-b 2 1 0)

(let ((result (matrix-multiply matrix-a matrix-b)))
(or (and (= (matrix-ref result 0 0) 5)
(= (matrix-ref result 0 1) 1)
(= (matrix-ref result 1 0) 4)
(= (matrix-ref result 1 1) 2))
(error "broken")))))
```
.
Now on to the naive multiplication routine. Another macro comes in handy.
```(define-syntax sum
(syntax-rules ()
((sum index-variable start limit-expr term)
(let ((limit limit-expr))
(do ((index-variable start (+ index-variable 1))

(define (matrix-multiply-0 left right)
(unless (= (matrix-width left) (matrix-height right))
(error "Cannot multiply"))
(let* ((result-height (matrix-height left))
(result-width  (matrix-width right))
(result-matrix (allocate-matrix result-height result-width)))
(dotimes (row result-height result-matrix)
(dotimes (column result-width)
(matrix-set!
result-matrix row column
(sum k
0 (matrix-height right)
(* (matrix-ref left  row k)
(matrix-ref right k column))))))))
```
.
It's naive, it's slow, but it is correct.

To create a spectral partition, we need the similarity matrix. This is easily computed by multiplying the matrix by its transpose.
```(define (transpose matrix)
(let* ((height (matrix-height matrix))
(width  (matrix-width matrix))
(new-matrix (allocate-matrix width height)))
(dotimes (i height new-matrix)
(dotimes (j width)
(matrix-set! new-matrix j i (matrix-ref matrix i j))))))

(define (similarity-matrix matrix)
(matrix-multiply matrix (transpose matrix)))
```
.
We first need the row sums of the similarity matrix.
```(define (matrix-row-sums matrix)
(let* ((height (matrix-height matrix))
(width  (matrix-width matrix))
(sum j
0 width
(matrix-ref matrix i j))))))
```
.
And we'll be making the matrix R by creating a diagonal matrix of those sums.
```(define (compute-R row-sums)
(let* ((height (vector-length row-sums))
(r (allocate-matrix height height)))
(dotimes (i height r)
(matrix-set! r i i (vector-ref row-sums i)))))
```
.
Vector PI is the normalized row sums.
```(define (compute-pi row-sums)
(let* ((length (vector-length row-sums))
(total  (summation i 0 length (vector-ref row-sums i)))
(pi     (make-vector length 0)))
(dotimes (i length pi)
(vector-set! pi i (/ (vector-ref row-sums i) total)))))
```
.
And matrix D is the diagonalized square root of Pi.
```(define (compute-d pi)
(let* ((h (vector-length pi))
(d (allocate-matrix h h)))
(dotimes (i h d)
(matrix-set! d i i (sqrt (vector-ref pi i))))))
```
.
We need to compute the inverses of R and D. Inverting a matrix is usually tricky, but since R and D are both diagonal, we can simply take the reciprocal of the diagonal elements. This function won't validate that the matrix is in fact diagonal, so the user ought to be aware!
```(define (compute-diagonal-inverse diagonal)
(let* ((height  (matrix-height diagonal))
(width   (matrix-width diagonal))
(inverse (allocate-matrix height width)))
(dotimes (i height inverse)
(matrix-set! inverse i i (/ 1 (matrix-ref diagonal i i))))))
```
.
We need one more thing. We need to multiply a matrix and vector. Since a vector (mathematically) is the same as a matrix with a single row or column, we can write simple converters and use our matrix multiply routine.
```(define (vector->column-matrix vector)
(let* ((height (vector-length vector))
(matrix-set! answer i 0 (vector-ref vector i)))))

(define (vector->row-matrix vector)
(let* ((width (vector-length vector))
(matrix-set! answer 0 i (vector-ref vector i)))))

(define (matrix->column-vector matrix)
(let* ((length (matrix-height matrix))
(vector-set! answer i (matrix-ref matrix i 0)))))

(define (matrix->row-vector matrix)
(let* ((length (matrix-width matrix))
(vector-set! answer j (matrix-ref matrix 0 j)))))

(define (vector-times-matrix vector matrix)
(matrix->column-vector
(matrix-multiply (vector->row-matrix vector) matrix)))

(define (matrix-times-vector vector matrix)
(matrix->row-vector
(matrix-multiply matrix (vector->column-matrix vector))))
```
.
At this point, we can write the code to do the first part of the spectral partitioning.
```(define (compute-q a)
(let* ((atrans (transpose a))
(att    (matrix-multiply a atrans))
(rho    (matrix-row-sums att))
(r      (compute-r rho))
(pi     (compute-pi rho))
(d      (compute-d pi))
(rinverse (compute-diagonal-inverse r))
(dinverse (compute-diagonal-inverse d))
(q (matrix-multiply d
(matrix-multiply rinverse
(matrix-multiply att
dinverse)))))
q))
```

Does it work?
```((.271 .092 .092 .081 .074 .074 .076 .102 0. 0.)
(.092 .246 .244 .072 .065 .065 .068 .109 0. 0.)
(.092 .244 .258 .071 .065 .065 .067 .108 0. 0.)
(.081 .072 .071 .188 .171 .171 .178 .079 0. 0.)
(.074 .065 .065 .171 .229 .219 .195 .072 0. 0.)
(.074 .065 .065 .171 .219 .229 .195 .072 0. 0.)
(.076 .068 .067 .178 .195 .195 .202 .075 0. 0.)
(.102 .109 .108 .079 .072 .072 .075 .26 0. 0.)
(0. 0. 0. 0. 0. 0. 0. 0. .571 .429)
(0. 0. 0. 0. 0. 0. 0. 0. .429 .571))
```
.
So far so good! Let's check the first eigenvector.
```(define (close-enuf? left right tolerance)
(let ((limit (vector-length left)))
(define (scan i)
(or (>= i limit)
(and (< (abs (- (vector-ref left i)
(vector-ref right i)))
tolerance)
(scan (+ i 1)))))
(scan 0)))

(define (test-eigenvector matrix eigenvector)
(or
(close-enuf? eigenvector
(matrix-times-vector matrix eigenvector)
.00001)
(error "It's not an eigenvector")))
```
.
And we'll need to create an orthogonal vector to get the second eigenvector.
```(define (vector-normalize v)
(let* ((h (vector-length v))
(total (sqrt (sum i
0 h
(let ((x (vector-ref v i)))
(* x x))))))
(vector-set! answer i (/ (vector-ref v i) total)))))

(define (orthogonal-vector v)
(let ((limit (vector-length v))
(n (make-vector (vector-length v) 0)))

(dotimes (i (floor (/ limit 2)))
(vector-set! n i (vector-ref v (- limit i 1)))
(vector-set! n (- limit i 1) (- (vector-ref v i))))
(vector-normalize n)))

(define (analyze a)
(let* ((atrans (transpose a))
(att    (matrix-multiply a atrans))
(rho    (matrix-row-sums att))
(r      (compute-r rho))
(pi     (compute-pi rho))
(d      (compute-d pi))
(rinverse (compute-diagonal-inverse r))
(dinverse (compute-diagonal-inverse d))
(q (matrix-multiply d
(matrix-multiply rinverse
(matrix-multiply att
dinverse))))
(first-eigenvector (vector-times-matrix pi dinverse)))

(test-eigenvector q first-eigenvector)

(let ((initial-vector (orthogonal-vector first-eigenvector)))
(do ((v (vector-normalize (matrix-times-vector q initial-vector))
(vector-normalize (matrix-times-vector q v)))
(old-v initial-vector v))
((close-enuf? v old-v 0.0001) v)))))
```
.
This code seems to work, but it is awfully slow. It takes almost two seconds to run on an initial matrix of 25 rows. I want to run on 58000 rows. I'll fix it tomorrow.