## Thursday, July 31, 2008

In the last post, I finally figured out what the equations were saying in the Heller and Ghahramani paper. I coded it up and got a bit of life out of the program, but there are still two more problems. The first was obvious. In the naive implementation, you can cluster about 20 items in a reasonable amount of time. This is a little short of the 50,000 items I wanted to run thruogh the program.

The most immediate problem turned out to be bignums. A quick inspection of the program shows a lot of calls to the gamma function. Gamma (14) is larger than 2^32, so even relatively tiny clusters will start using bignum arithmetic. The product of several calls to gamma will be bigger still, and unless you are very lucky, the ratio of products of gamma will give you rational numbers with bignum numerators and denominators. The answers are perectly exact, if you are willing to wait for them.

The obvious answer is floating point. This gives a definite improvement in speed and space and I could cluster close to 170 items in a reasonable amonut of time. But the gamma function once again starts causing problems. Gamma (172) is too large to be represented as a (double-precision) floating point number.

If you've been following this blog for the past few weeks, you probably know what the solution is. We convert the program to work in log-odds space. This is a tiny bit tricky, but we can use the initial naive implementation as a reference to make sure we don't screw up. Recall that the marginal probability of clustering is given by function R. We simply want to compute log-odds of R.
```
(define (probability->log-odds p)
(10log10 (probability->odds p)))

(define (probability->odds p)
(/ p (- 1 p)))
```

.
Rather than use the natural log, I prefer to use 10log10. This gives me the odds in decibels (not as nutty as you might think!). I ran a few test problems through the naive solution and then computed the log-odds of the answer so I could check that any new code computed the same answer.

The first thing to do is to convert the program to odds space. Here's how we go about it. We start with our definition of R.
```
(define (r cluster)
(/ (* (pi cluster) (ph1 cluster))
(pd cluster)))
```

.
And combine it with our definition of probability->odds:
```
(define (r-odds cluster)
(probability->odds
(/ (* (pi cluster) (ph1 cluster))
(pd cluster))))
```

.
(That was trivial.) Now we inline the definition of probability->odds.
```
(define (r-odds cluster)
((lambda (p)
(/ p (- 1 p)))
(/ (* (pi cluster) (ph1 cluster))
(pd cluster))))
```

.
And beta reduce.
```
(define (r-odds cluster)
(/ (/ (* (pi cluster) (ph1 cluster))
(pd cluster))
(- 1 (/ (* (pi cluster) (ph1 cluster))
(pd cluster)))))
```

.
That's uglier. But we can fiddle around with this a bit. First add some temporaries to help unclutter the code.
```
(define (r-odds cluster)
(let ((t0 (ph1 cluster))
(t1 (pi cluster))
(t2 (pd cluster)))
(/ (/ (* t1 t0) t2)
(- 1 (/ (* t1 t0) t2)))))
```

.
There are some common terms between pi and pd that look like they might cancel, so let's substitute in pd.
```
(define (r-odds cluster)
(let ((t0 (ph1 cluster))
(t1 (pi cluster))
(t2 (if (leaf-node? cluster)
(ph1 cluster)
(let ((pi_k (pi cluster)))
(+ (*      pi_k  (ph1 cluster))
(* (- 1 pi_k) (ph2 cluster)))))))
(/ (/ (* t1 t0) t2)
(- 1 (/ (* t1 t0) t2)))))
```

.
And remove the redundant computations:
```
(define (r-odds cluster)
(let* ((t0 (ph1 cluster))
(t1 (pi cluster))
(t2 (if (leaf-node? cluster)
t0
(+ (*      t1  t0)
(* (- 1 t1) (ph2 cluster))))))
(/ (/ (* t1 t0) t2)
(- 1 (/ (* t1 t0) t2)))))
```

.
And substitute in the definition of pi:
```
(define (r-odds cluster)
(let* ((t0 (ph1 cluster))
(t1 (if (leaf-node? cluster)
1
(/ (* *alpha* (gamma (n cluster)))
(d cluster))))
(t2 (if (leaf-node? cluster)
t0
(+ (*      t1  t0)
(* (- 1 t1) (ph2 cluster))))))
(/ (/ (* t1 t0) t2)
(- 1 (/ (* t1 t0) t2)))))
```

.
And the definition of d.
```
(define (r-odds cluster)
(let* ((t0 (ph1 cluster))
(t1 (if (leaf-node? cluster)
1
(/ (* *alpha* (gamma (n cluster)))
(if (leaf-node? cluster)
*alpha*               ;; tuning parameter
(+ (* *alpha* (gamma (n cluster)))
(* (d (left-child cluster))
(d (right-child cluster))))))))
(t2 (if (leaf-node? cluster)
t0
(+ (*      t1  t0)
(* (- 1 t1) (ph2 cluster))))))
(/ (/ (* t1 t0) t2)
(- 1 (/ (* t1 t0) t2)))))
```

.
The probability of R on a leaf node is 1, so the odds of R on a leaf node would be infinite. But we never call R on a leaf node, so we can simply ignore that branch of the conditionals.
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t1 (/ (* *alpha* (gamma (n cluster)))
(+ (* *alpha* (gamma (n cluster)))
(* (d (left-child cluster))
(d (right-child cluster))))))
(t2 (+ (*      t1  t0)
(* (- 1 t1) (ph2 cluster)))))
(/ (/ (* t1 t0) t2)
(- 1 (/ (* t1 t0) t2))))))
```

.
Let's pull out some common terms:
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster))))
(t1 (/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster))))))
(t4 (* t1 t0))
(t2 (+ t4
(* (- 1 t1) (ph2 cluster)))))
(/ (/ t4 t2)
(- 1 (/ t4 t2))))))

a/b           a
And use the identity --------- = -----------
1 - a/b       b - a

(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster))))
(t1 (/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster))))))
(t4 (* t1 t0))
(t2 (+ t4
(* (- 1 t1) (ph2 cluster)))))
(/ t4
(- t2 t4)))))
```

.
Simplify the denominator:
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster))))
(t1 (/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster))))))
(t4 (* t1 t0)))
(/ t4
(* (- 1 t1) (ph2 cluster))))))
```

.
Factor out t3:
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster))))
(t1 (/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster))))))
(t4 (* t1 t0)))
(/ t4
(* (- 1 t1) (ph2 cluster))))))
```

.
Distribute the denominator:
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster))))
(t1 (/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster))))))
(t4 (* t1 t0)))
(/ t4
(- (ph2 cluster) (* t1 (ph2 cluster)))))))
```

.
Divide top and bottom by t1
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster))))
(t1 (/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster))))))
(t4 (* t1 t0)))
(/ (/ t4 t1)
(- (/ (ph2 cluster) t1) (/ (* t1 (ph2 cluster)) t1))))))

```

.
Simplify
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster))))
(t1 (/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster)))))))
(/ t0
(- (/ (ph2 cluster) t1)  (ph2 cluster))))))

```

.
Substitute t1
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster)))))
(/ t0
(- (/ (ph2 cluster)
(/ t3
(+ t3
(* (d (left-child cluster))
(d (right-child cluster))))))
(ph2 cluster))))))

```

.
Simplify
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster)))))
(/ t0
(- (* (ph2 cluster)
(/ (+ t3
(* (d (left-child cluster))
(d (right-child cluster))))
t3))
(ph2 cluster))))))

```

.
Factor out ph2
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster)))))
(/ t0
(* (ph2 cluster) (- (* 1
(/ (+ t3
(* (d (left-child cluster))
(d (right-child cluster))))
t3))
1))))))

```

.
Multiply top and bottom by t3
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster)))))
(/ (* t3 t0)
(* t3 (ph2 cluster) (- (* 1
(/ (+ t3
(* (d (left-child cluster))
(d (right-child cluster))))
t3))
1))))))

```

.
Simplify
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster)))))
(/ (* t3 t0)
(* (ph2 cluster) (* (d (left-child cluster))
(d (right-child cluster))))))))

```

.
Now substitute ph2:
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(let* ((t0 (ph1 cluster))
(t3 (* *alpha* (gamma (n cluster)))))
(/ (* t3 t0)
(* (pd (left-child cluster))
(pd (right-child cluster))
(d (left-child cluster))
(d (right-child cluster)))))))

```

.
And combine pd and d into a new function:
```
(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(/ (* *alpha* (gamma (n cluster)) (ph1 cluster))
(* (px (left-child cluster))
(px (right-child cluster))))))

(define (px cluster)
(* (pd cluster) (d cluster)))

```

.
Using similar rewriting, we can define px more simply as this:
```
(define (px cluster)
(if (leaf-node? cluster)
(* *alpha* (gamma (n cluster)) (ph1 cluster))
(+ (* *alpha* (gamma (n cluster)) (ph1 cluster))
(* (px (left-child cluster))
(px (right-child cluster))))))

```

.
If we define ph1a as
```
(define (ph1a cluster)
(* *alpha* (gamma (n cluster)) (ph1 cluster)))

```

.
We can further simplify:
```
(define (px cluster)
(if (leaf-node? cluster)
(ph1a cluster)
(+ (ph1a cluster)
(* (px (left-child cluster))
(px (right-child cluster))))))

(define (r-odds cluster)
(if (leaf-node? cluster)
(error "Odds are infinite.")
(/ (ph1a cluster)
(* (px (left-child cluster))
(px (right-child cluster))))))

```

.
Whew! That was work. Fortunately, we can compare the output from the naive version with the output of this version to see if the answers are the same.

Next time, we'll take the logarithm of this code.