Monday, August 4, 2008

Last time I converted the original Bayesian Hierarchical Clustering code to work in odds space rather than in probability space. This was the first part of converting the entire program to work in log-odds space. Converting to log space is somewhat similar to the previous conversion, but there are a couple of differences. In converting to odds space, we mostly had a hairball of simple algebra to untangle. When we convert to log space we'll have to convert more of the program and pay attention to the operators.

First we convert the top-level function r-odds:

(define (r-log-odds cluster) 
  (if (leaf-node? cluster)
      (error "Odds are infinite.")
      (10log10 (/ (ph1a cluster)
                  (* (px (left-child cluster))
                     (px (right-child cluster)))))))

.
Recall that the log of a ratio is the difference of the logs of the numerator and denominator and the log of a product is the sum of the logs of the multiplicands. We can rewrite it like this:

(define (r-log-odds cluster) 
  (if (leaf-node? cluster)
      (error "Odds are infinite.")
      (- (10log10 (ph1a cluster))
         (+ (10log10 (px (left-child cluster)))
            (10log10 (px (right-child cluster)))))))

.
And now we'll write log versions of ph1a and px, so we have:

(define (r-log-odds cluster) 
  (if (leaf-node? cluster)
      (error "Odds are infinite.")
      (- (ph1a-log cluster)
         (+ (px-log (left-child cluster))
            (px-log (right-child cluster))))))

(define (px-log cluster)
  (if (leaf-node? cluster)
      (ph1a-log cluster)
      (10log10 (+ (ph1a cluster)
                  (* (px (left-child cluster))
                     (px (right-child cluster)))))))

.
Unfortunately, the log of a sum doesn't have an easy transformation. On the other hand, it isn't all that hard, either.

(define (log-sum left right)
  (if (> right left)
      (log-sum right left)
      (+ left (10log10 (+ 1.0 (inverse-10log10 (- right left)))))))

.
If the difference between left and right is small enough, taking the inverse log (exponentiating) won't be much of an issue, but if the difference gets big, this could cause us some problems. For the moment, we'll just leave it and hope we don't have to revisit it later.

(define (px-log cluster)
  (if (leaf-node? cluster)
      (ph1a-log cluster)
      (log-sum (ph1a-log cluster)
               (+ (px-log (left-child cluster))
                  (px-log (right-child cluster))))))

(define (ph1a-log cluster)
  (+ (10log10 *alpha*)
     (log-gamma (n cluster))
     (ph1-log cluster)))

.
We're hardly the first to want to compute in log odds space. The log-gamma function is fairly well-known and there are good approximations to it.

  (define (ph1-log cluster)
    (let ((data-points (cluster-data-points cluster)))
      (sum (lambda (term)
                 (let ((present (count-occurrances term data-points))
                       (absent (- (length data-points) present))
                   (log-bernoulli-beta A B present absent))))
                all-terms))
     )

.
Ph1-log is like ph1, but we take the sum over the term rather than the product. Log-bernoulli-beta is trivially defined as the sums and differences of the appropriate log-gamma calls.

This part of the conversion was incredibly easy, but I did gloss over a couple of interesting things. I rather cavalierly propagated the log function through conditionals:

   (log (if <predicate>          (if <predicate>
            <consequence>    =>      (log <consequence>)
            <alternative>))          (log <alternative>))

.
That turns out to be a legal transformation, but it would be nice to understand why. Suppose instead of a conditional, I had a different operation that combined two functions.

   (log (op f1 f2))  <=!=>  (op (log f1) (log f2)) **WRONG

.
I cannot simply rewrite this as (op (log f1) (log f2)). If op were *, this would be clearly wrong. What I need to do is find an operation in log-space that has the effect of op after exponentiation.

   (log (op f1 f2))  <===>  ((logequiv op) (log f1) (log f2))

         (logequiv *) = +
         (logequiv /) = -
         (logequiv +) = log-sum

         (logequiv if) = ????

.
We know that * becomes +, / becomes -, + becomes log-sum, but we should find the log-space equivalent of ‘if’.

         (logequiv if) = if

.
It turns out that if simply stays if, so the transformation above is legal.

With the change from exact probabilities to floating-point operations in log space, we get about a factor of 10 improvement in performance. Now we can cluster hundreds of data points rather than only dozens. There are two more substantial speedups to apply.

2 comments:

Anonymous said...

No need to consider log equivalency for IF.

(FUN (IF X Y Z)) == (IF X (FUN Y) (FUN Z))

Always. Since IF is not a function, you need to consider evaluation order, nothing else.

Joe Marshall said...

I disagree just a bit. We're using IF to paste together two partial functions: the one for the case that X is true, the other for the case X is false. We'd like to be sure our resulting function is well-defined over the domain of interest, and because our predicate is binary we're ok.

But what if we had a language with `fuzzy' predicates that indicate a blend between truth and falsehood. We'd need to figure out how that blend would map into the log space. In other words, we do need to consider the log-space version of if, but because it is simply a discontinuity at a single point, the equivalent version is trivially the same.