Tuesday, July 8, 2008

Probability and floating point

Floating point numbers are not evenly distributed. For every value of the exponent, there are 2^53 values that can be represented. In other words, between 1.0 and 2.0, there are 2^53 floating point numbers and between 1024.0 and 2048.0 there are also 2^53 floating point numbers. Since the range grows as the exponent, floating point numbers get sparser and sparser as the exponent goes up and denser and denser as it goes down.

This has a curious effect when you try to use floating point when computing probabilities. Floating point numbers are quite dense near 0, but much sparser near 1. The largest non-certain probability you can represent as a double precision floating point number is 0.9999999999999999d0, but the smallest non-zero probability you can represent is 2.2250738585072014d-308 So when you are working with marginal probibilities (determining which of two things is more likely), you can discriminate between unlikely things to a much greater extent than you can discriminate between likely ones.

Between throwing away 3/4 of your floating point numbers (negative numbers and positive ones greater than 1) and cramming almost all of the remaining ones down near zero, you can run out of precision when dealing with probabilities.

What are the odds?

You can solve part of this problem by computing the odds rather than the probability. The odds are the ratio of success to failure rather than the ratio of success to all tries. The odds are easily computed from the probability by dividing the probability by (1 - probability). Odds can take on any non-negative value, so you have twice as many floating point values to work with. Furthermore, `even' odds, representing a 50% chance, is 1.0 so there are about an equal number of floating point values above 50% as there are below it (part of the range is taken by NaNs and there are extra denormalized floats really close to zero.)

Working in odds space doubles the number of floating point values you can use, but there are still issues with varying precision. Extremely likely odds don't have anywhere near the same precision as extremely unlikely ones. One often computes the probability of an event by noticing that p + (1 - p) = 1, in other words, you can compute the probability of an event occurring by determining the ways it *won't* occur and subtracting that from 1. But when your floating point precision changes dramatically between low and high probability or odds, you can lose a huge amount of precision when relying on that identity.

What we need to do is to compensate for the varying precision by stretching the dense regions of floating point numbers and compressing the sparse regions. Taking the logarithm of the odds accomplishes this nicely. Working in log-odds space has long been known to be convenient for mathematicians, but few people seem aware that log-odds space is a better place to compute with floating point numbers.

1 comment:

Panos Ipeirotis said...

Using logarithms of the probabilities (or of the odds) typically helps a lot...