Sunday, June 24, 2007

Why floating-point numbers are not intervals

To understand floating point arithmetic, we have to describe how floating-point numbers map to mathematical numbers and how floating point operations map to mathematical operations. Many people believe that floating point arithmetic is some variation of interval arithmetic. This view can be described in this picture:

```                    float operation
floating  ------------------------>  floating
point                                 point
|                                     ^
|                                     |
|                                     |
|                                     |
|                                     |
V         interval operation          |
interval ------------------------->  interval
```

Interval view In this view, there is a mapping from floating point numbers to intervals. For each floating point operation, there is an operation on the intervals that the floats map to. The resulting intervals are then mapped to the corresponding floating point result. An alternative view is this:

```                    float operation
floating  ------------------------>  floating
point                                 point
|                                     ^
|                                     |
|                                  possible
|                                  rounding
|                                     |
V      exact rational operation       |
rational ------------------------->  rational
```

Rational View In this view, each floating point number represents an exact rational number. Operations on floating point numbers are mapped to the corresponding exact rational operation. If necessary, the resulting exact rational number is mapped to an appropriate floating point number by rounding. Since some rationals correspond exactly to floating point values, rounding may not be necessary. Are these views equivalent? For our purposes, an interval is a set of real numbers we can describe by two endpoints, the lower bound and the upper bound. Either of the endpoints may be part of the interval or not. Given an arbitrary real number X, any pair of real numbers, one of which is below the rational, one of which is above, will define an interval that contains X. There is no one-to-one mapping from rational numbers to intervals: a rational number such as 5/8 is contained by any interval with a lower bound less than 5/8 and an upper bound greater than 5/8. For example, the intervals [0, 30), (1/2, 1], and (9/16, 11/16) all contain 5/8. Also, any non-degenerate interval (that is, where the endpoints are distinct), contains an infinite number of rationals. These views cannot be semantically equivalent, but are they *computationally* equivalent? A computation that used the interval model may yield the same floating point result as one that performed under the rational model. (Obviously, we would reject a model which didn't give the result `1 + 1 = 2'.) But do all computations performed by one model yield the same answer as the other? If not, then we can at least distinguish which model an implementation uses if not decide which model is better. To determine computational equivalence, we need to be precise about what operations are being performed. To aid in this, I have defined a tiny floating point format that has the essential characteristics of binary floating point as described by IEEE 754. This format has 2 bits of significand and 3 bits of mantissa. The significand can take on the values 0-3 and the exponent can take on the values 0-7. We can convert a tiny float to an exact rational number with this formula: (significand + 5) * 2 (exponent - 5) There are 32 distinct tiny floating-point values, the smallest is 5/32 and the largest is 32. I use the notation of writing the significand, the letter T, then the exponent. For example, the tiny float 2T3 is equal to (2 + 5) * (2 ^ (3 - 5)) = 1 3/4. Since there are so few of them, we can enumerate them all:

```0T0 = 5/32     0T2 = 5/8    0T4 = 5/2   0T6 = 10
1T0 = 3/16     1T2 = 3/4    1T4 = 3 1T6 = 12
2T0 = 7/32     2T2 = 7/8    2T4 = 7/2  2T6 = 14
3T0 = 1/4      3T2 = 1      3T4 = 4 3T6 = 16
0T1 = 5/16     0T3 = 5/4    0T5 = 5 0T7 = 20
1T1 = 3/8      1T3 = 3/2    1T5 = 6 1T7 = 24
2T1 = 7/16     2T3 = 7/4    2T5 = 7 2T7 = 28
3T1 = 1/2      3T3 = 2      3T5 = 8    3T7 = 32
```

Table of tiny floats We can also enumerate the intervals on the real number line that map to tiny floats:

```0T0 = [5/32  11/64]   0T2 = [ 9/16 11/16]
1T0 = (11/64 13/64)   1T2 = (11/16 13/16)
2T0 = [13/64 15/64]   2T2 = [13/16 15/16]
3T0 = (15/64  9/32)   3T2 = (15/16  9/8 )
0T1 = [ 9/32 11/32]   0T3 = [ 9/8  11/8 ]
1T1 = (11/32 13/32)   1T3 = (11/8  13/8 )
2T1 = [13/32 15/32]   2T3 = [13/8  15/8 ]
3T1 = (15/32  9/16)   3T3 = (15/8   9/4 )

0T4 = [ 9/4 11/4]     0T6 = [ 9 11]
1T4 = (11/4 13/4)     1T6 = (11 13)
2T4 = [13/4 15/4]     2T6 = [13 15]
3T4 = (15/4  9/2)     3T6 = (15 18)
0T5 = [ 9/2 11/2]     0T7 = [18 22]
1T5 = (11/2 13/2)     1T7 = (22 26)
2T5 = [13/2 15/2]     2T7 = [26 30]
3T5 = (15/2  9)       3T7 = (30 32)
```

Table of real intervals Arithmetic on intervals is relatively easy to define. For closed intervals, we can define addition and subtraction like this: [a b] + [c d] = [a+c b+d] [a b] - [c d] = [a-d b-c] Multiplication and division are a bit trickier: [a b] * [c d] = [min(ac, ad, bc, bd) max(ac, ad, bc, bd)] [a b] / [c d] = [min(a/c, a/d, b/c, b/d) max(a/c, a/d, b/c, b/d)] The formulae for open and semi-open intervals are similar. A quick check: 1 is in the interval (15/16 9/8). (15/16 9/8) + (15/16 9/8) = (15/8 9/4) 2 is in the interval (15/8 9/4), so 1 + 1 = 2. So far, so good, but we will soon discover that interval arithmetic has problems. The first hint of a problem is that the intervals are not of uniform size. The intervals at the low end of the line are much smaller than the intervals at the high end. Operations that produce answers much smaller than the inputs will have answers that span several intervals. For example, 20 - 14 20 is in the interval [18 22] 14 is in the interval [13 15] [18 22] - [13 15] = [3 9] We have 7 intervals that fall in that range, but none are large enough to cover the result. The second hint of a problem comes when multiply. It is true that 1+1=2, but does 1*2=2? 1 is in the interval (15/16 9/8) 2 is in the interval (15/8 9/4) (15/16 9/8) * (15/8 9/4) = (225/128 81/32) This is clearly not the same as (15/8 9/4). It seems that we have lost the multiplicative identity. An adherent of the interval model will have to come up with a way to map these arbitrary result intervals back into the floating-point numbers. In an upcoming post, I'll describe the rational model in more detail.