Tuesday, July 7, 2009

Trauma on the ice

So our skater was spinning and flinging off drops of blood at random angles. We know the position of the stains along the wall, but we want to know the position of the skater on the ice. If we knew where the skater was, we could compute the probability of a blood stain being at a particular offset. (Assume that the probability distribution of the rotation angle θ is uniform.) Let's put a Cartesian coordinate system on this problem:
    | y = 0, left wall
    /              \
    |              |
    *              |
    |              |
    |  8           |
--- |--------------| ------  x = 0  centerline
    *              |
    *              |
    |              |
    |              |
    \              /
The skater has left a figure ‘8’ on the ice and the asterisks represent a few of the blood stains.

We'll call the skater's distance from the wall (along the X axis) α and the distance from the centerline β. For any particular point on the wall, the probability of it having a stain is function of α and β.
(define (stain-probability alpha beta)
  (lambda (y-pos)
    (/ alpha
       (* *pi* (+ (square alpha)
    (square (- y-pos beta)))))))
So if the skater was at (2,3) we'd get these values:
(define demo1 (stain-probability 2 3))
;Value: demo1

(demo1 2)
;Value: .12732395447351627

(demo1 3)
;Value: .15915494309189535

(demo1 4)
;Value: .12732395447351627
We can see that the probability peaks at the position closest to where the skater is. If the skater were further away from the wall, the peak is at the same place, but it is much less distinct.

(define demo2 (stain-probability 5 3))
;Value: demo2

(demo2 2)
;Value: .06121343965072897

(demo2 3)
;Value: .06366197723675814

(demo2 4)
;Value: .06121343965072897
We need to ‘turn around’ this equation. We have the probability of a stain as a function of position, we need the probability of the position as a function of the stain. We can compute α and β separately (notice that the position of the peak doesn't change when we get further away from the wall, only the dispersion changes). So we can use Bayes theorem to invert our equation: p(β|y,α) ∝ p(y|α,β) * p(β|α) Here, y stands for the proposition that the bloodstain is at a particular offset on the y axis.

Our knowledge of the position of any one blood stain gives us no information about the position of any other, so we can simply multiply the probabilities together:p(Y|α,β) = ∏ p(yi|α,β) Here, Y is the probability of all the blood stains and it is simply the probability of each stain multiplied together.

If we substitute this back in, we can compute the likelihood of β: log(p(β|Y,α)) = constant - ∑ log(α2+(yi2)) Our best estimate of β is that β that maximizes the right-hand side. (Or minimizes the second term.)

Are your eyes glazed over? Mine are. I'm paraphrasing from Data analysis By D. S. Sivia, John Skilling. It seems like a good book, but when I read math books the equations start all looking the same.
Polonius: What do you read, my lord?
Hamlet: Math... math... math...

;; Given α (distance to the wall) and a list of the
;; y-coordinates of the blood stains, compute the second term
;; of the log of p(β)
(define (term2 alpha stains)
  (lambda (beta)
    (sum (lambda (y)
           (log (+ (square alpha)
                   (square (- y beta)))))
The reader can implement sum for fun. So let's look at how this behaves.
;; Given the position of the skater, return a procedure that generates
;; the y position of the blood stains.  Simulates the problem.
(define (skater alpha beta)
  (lambda ()
    (+ (* (tan (* (random 1.0) *pi*)) alpha)

;; Given a skater and n, return a list of stain positions.
(define (spots skater n)
  (do ((i 0 (+ i 1))
       (spots '() (cons (skater) spots)))
      ((>= i n) spots)))

(define my-spots (spots (skater 1 2) 10))

;Value 37: (-.4391382779105797 3.7536812452447545 1.9033520510509812 2.227674346928543 10.229175820018094 8.35132161012094 2.0848923361826426 2.868570305181259 4.926524369307749 -1.1024829548141497)

;; Search for minimum.
(let ((f (term2 1 my-spots)))
  (do ((i -5 (+ i 1)))
      ((>= i 5))
    (display i)
    (display " ")
    (display (f i))
-5 41.39540398748283
-4 38.485040725146305
-3 35.00661596259841
-2 30.748568382398258
-1 26.161800734404803
0 23.049858617589564
1 20.020797776287406
2 16.5459276648517     ;; note minimum here
3 16.679502775207673
4 19.15407343693434
So we deduce that beta was 2.

I'm not entirely satisfied with this solution. I think I understand it, but I'm not sure I could reproduce it. The reason I was playing with this problem is that I want to automatically fit the linear segment of that data set I was playing with.