## Tuesday, June 26, 2007

### How Floating Point Works

A floating-point number is neither a Real number nor an Interval --- it is an exact rational number. A floating-point number is not uncertain or imprecise, it exactly and precisely has the value of a rational number. However, not every rational number is represented. Floating-point numbers are drawn from a carefully selected set of rational numbers. The particulars vary among the different floating-point formats. The popular IEEE 754 standard uses these: Single Precision The significand is an integer in the range [8388608, 16777215) which is multiplied by a power of 2 in the range 2-149 = 1/713623846352979940529142984724747568191373312 2104 = 20282409603651670423947251286016 Double Precision The significand is an integer in the range [4503599627370496, 9007199254740991) which is multiplied by a power of 2 in the range 2-1074 = 1/202402253307310618352495346718917307049556649764142118356901358027430339567995346891960383701437124495187077864316811911389808737385793476867013399940738509921517424276566361364466907742093216341239767678472745068562007483424692698618103355649159556340810056512358769552333414615230502532186327508646006263307707741093494784 2971 = 19958403095347198116563727130368385660674512604354575415025472424372118918689640657849579654926357010893424468441924952439724379883935936607391717982848314203200056729510856765175377214443629871826533567445439239933308104551208703888888552684480441575071209068757560416423584952303440099278848 Since the power of 2 can be negative, it is possible to represent fractions. For instance, the fraction 3/8 is represented as

```            -25     12582912
12582912 * 2   = ------------- = 3/8
33554432
```

in single precision floating point. In double-precision, 3/8 is represented as

```                    -54    6755399441055744
6755399441055744 * 2   = ------------------- = 3/8
18014398509481984
```
. Not every fraction can be represented. For example, the fractions 1/10 and 2/3 cannot be represented. Only those fractions whose denominator is a power of 2 can be represented. It is possible to represent integers. For instance, the integer 123456 is represented as

```            -7    15802368
15802368 * 2   = --------- = 123456
128
```
. Not every integer can be represented. For example the integer 16777217 cannot be represented in single precision. -------- When floating-point numbers appear in programs as literal values, or when they are read by a program from some data source, they are usually expressed as a decimal number. For example, someone might enter "38.21" into a form. Some decimal numbers, like 0.125, can be represented as a floating-point number, but most fractional decimal numbers have no representation. If a literal number cannot be represented, then the system usually silently substitutes a nearby representable number. For example, if the expression 0.1 appears as a literal in the code,the actual floating point value used may be

```            -27     13421773
13421773 * 2   = ------------- > .1
134217728
```
. as another example, 38.21

```            -18     10016522
10016522 * 2   = ----------- < 38.21
262144
```
. Most implementations substitute the nearest representable number,but this has not always been the case. As it turns out, much of the error in floating-point code comes from the silent substitution performed on input. -------- Floating-point numbers are usually printed as decimal numbers. All binary floating-point numbers have an exact decimal equivalent, but it usually has too many digits to be practical. For example, the single-precision floating-point number

```            -25     13421773
13421773 * 2    = ----------- = 0.4000000059604644775390625
33554432
```
. Furthermore, the extra digits may be misconstrued as precision that doesn't exist. The next largest float above that:

```             -25     6710887
13421774 * 2    = ----------- = 0.400000035762786865234375
16777216
```
. doesn't share any digits beyond the first 7. There is no standard way to print the decimal expansion of a floating-point number and this can lead to some bizarre behavior. However, recent floating point systems print decimal numbers with only as many digits as necessary to ensure that the number will be exactly reconstructed if it is read in. This is intended to work correctly even if the implementation must substitute a nearby representable number on input. This allows one to `round-trip' floating-point numbers through their decimal representation without error. (NOTE: Not all floating point systems use this method of parsimonious printing. One example is Python.) There is a curious side effect of this method of printing. A literal floating-point number will often print as itself (in decimal) even if it was substituted in binary. For example, if the expression 0.1 appears as a literal in the code, and is substituted by

```            -27      13421773
13421773 * 2    = -------------
134217728
```
. this substitute number will be printed as "0.1" in decimal even though it is slightly larger than 0.1 As mentioned above, Python does not use parsimonious printing, so it is often the case in Python that a literal floating-point number will not print as itself. For example, typing .4 at the Python prompt will return "0.40000000000000002" The round-trip property is nice to have, and it is nice that floating-point fractions don't seem to change, but it is important to know this fact: The printed decimal representation of a floating-point number may be slightly larger or smaller than the internal binary representation. -------- Arithmetic Simple arithmetic (add, subtract, multiply, or divide) on floating-point numbers works like this: 1. Perform the operation exactly. Treat the inputs as exact rationals and calculate the exact rational result. 2a. If the exact rational result can be represented, use it. 2b. If the exact rational result cannot be represented, use a nearby exact rational that can be represented. (round) This is important, and you may find this surprising: Floating-point operations will produce exact results without rounding if the result can be represented. There are a number of interesting cases where we can be assured that exact arithmetic will be used. One is with addition, multiplication,and subtraction of integers within the range of the significand. For single-precision floating point, this is the range [-16777215,16777215]. For double-precision, [-9007199254740991,9007199254740991]. Even integer division (modulo and remainder) will produce exact results if the inputs are integers in the correct range. Multiplication or division by powers of two will also produce exact results in any range (provided they don't overflow or underflow). In many cases floating-point arithmetic will be exact and the only source of error will be the substitution when converting the input from decimal to binary and how the result is printed. However, in most cases other than integer arithmetic, the result of a computation will not be representable in the floating point format. In these case, rule 2b says that the floating-point system may substitute a nearby representable number. This number, like all floating-point numbers, is an exact rational, but it is wrong. It will differ from the correct answer by a tiny amount. For example, in single precision adding .1 to 0.375:

```             -27     13421773
13421773 * 2    = ------------- ~= 0.1
134217728

-25     12582912
12582912 * 2   = ------------- = 0.375
33554432
```
. gives us a result that is slightly smaller than the true answer of 19/40

```            -25     15938355       19
15938355 * 2   = ------------- < ---
33554432       40

```
-------- Further topics 1. Denormalized numbers and gradual underflow. These give you a `soft landing' when you use numbers really close to 0. 2. Infinities 3. Negative 0.0 4. NaN's 5. Traps and flags -------- Floating point myths Myth: Floating-point numbers are an approximation. Fact: Floating-point numbers are exact, rational numbers. Myth: Floating-point math is always only an approximation, and all results are off by a small amount. Fact: Many floating-point operations produce exact, correct answers. Myth: Floating-point math is mysterious. Fact: It is the same old rational arithmetic you've used for ages. -------- Bibliography @article{ goldberg91what, author = "David Goldberg", title = "What Every Computer Scientist Should Know About Floating-Point Arithmetic", journal = "ACM Computing Surveys", volume = "23", number = "1", pages = "5--48", year = "1991", url = "citeseer.ist.psu.edu/goldberg91what.html" } All the papers at http://www.cs.berkeley.edu/~wkahan/ Joe Darcy's talk at http://blogs.sun.com/darcy/resource/Wecpskafpa-ACCU.pdf

## Sunday, June 24, 2007

### Why floating-point numbers are not intervals

To understand floating point arithmetic, we have to describe how floating-point numbers map to mathematical numbers and how floating point operations map to mathematical operations. Many people believe that floating point arithmetic is some variation of interval arithmetic. This view can be described in this picture:

```                    float operation
floating  ------------------------>  floating
point                                 point
|                                     ^
|                                     |
|                                     |
|                                     |
|                                     |
V         interval operation          |
interval ------------------------->  interval
```

Interval view In this view, there is a mapping from floating point numbers to intervals. For each floating point operation, there is an operation on the intervals that the floats map to. The resulting intervals are then mapped to the corresponding floating point result. An alternative view is this:

```                    float operation
floating  ------------------------>  floating
point                                 point
|                                     ^
|                                     |
|                                  possible
|                                  rounding
|                                     |
V      exact rational operation       |
rational ------------------------->  rational
```

Rational View In this view, each floating point number represents an exact rational number. Operations on floating point numbers are mapped to the corresponding exact rational operation. If necessary, the resulting exact rational number is mapped to an appropriate floating point number by rounding. Since some rationals correspond exactly to floating point values, rounding may not be necessary. Are these views equivalent? For our purposes, an interval is a set of real numbers we can describe by two endpoints, the lower bound and the upper bound. Either of the endpoints may be part of the interval or not. Given an arbitrary real number X, any pair of real numbers, one of which is below the rational, one of which is above, will define an interval that contains X. There is no one-to-one mapping from rational numbers to intervals: a rational number such as 5/8 is contained by any interval with a lower bound less than 5/8 and an upper bound greater than 5/8. For example, the intervals [0, 30), (1/2, 1], and (9/16, 11/16) all contain 5/8. Also, any non-degenerate interval (that is, where the endpoints are distinct), contains an infinite number of rationals. These views cannot be semantically equivalent, but are they *computationally* equivalent? A computation that used the interval model may yield the same floating point result as one that performed under the rational model. (Obviously, we would reject a model which didn't give the result `1 + 1 = 2'.) But do all computations performed by one model yield the same answer as the other? If not, then we can at least distinguish which model an implementation uses if not decide which model is better. To determine computational equivalence, we need to be precise about what operations are being performed. To aid in this, I have defined a tiny floating point format that has the essential characteristics of binary floating point as described by IEEE 754. This format has 2 bits of significand and 3 bits of mantissa. The significand can take on the values 0-3 and the exponent can take on the values 0-7. We can convert a tiny float to an exact rational number with this formula: (significand + 5) * 2 (exponent - 5) There are 32 distinct tiny floating-point values, the smallest is 5/32 and the largest is 32. I use the notation of writing the significand, the letter T, then the exponent. For example, the tiny float 2T3 is equal to (2 + 5) * (2 ^ (3 - 5)) = 1 3/4. Since there are so few of them, we can enumerate them all:

```0T0 = 5/32     0T2 = 5/8    0T4 = 5/2   0T6 = 10
1T0 = 3/16     1T2 = 3/4    1T4 = 3 1T6 = 12
2T0 = 7/32     2T2 = 7/8    2T4 = 7/2  2T6 = 14
3T0 = 1/4      3T2 = 1      3T4 = 4 3T6 = 16
0T1 = 5/16     0T3 = 5/4    0T5 = 5 0T7 = 20
1T1 = 3/8      1T3 = 3/2    1T5 = 6 1T7 = 24
2T1 = 7/16     2T3 = 7/4    2T5 = 7 2T7 = 28
3T1 = 1/2      3T3 = 2      3T5 = 8    3T7 = 32
```

Table of tiny floats We can also enumerate the intervals on the real number line that map to tiny floats:

```0T0 = [5/32  11/64]   0T2 = [ 9/16 11/16]
1T0 = (11/64 13/64)   1T2 = (11/16 13/16)
2T0 = [13/64 15/64]   2T2 = [13/16 15/16]
3T0 = (15/64  9/32)   3T2 = (15/16  9/8 )
0T1 = [ 9/32 11/32]   0T3 = [ 9/8  11/8 ]
1T1 = (11/32 13/32)   1T3 = (11/8  13/8 )
2T1 = [13/32 15/32]   2T3 = [13/8  15/8 ]
3T1 = (15/32  9/16)   3T3 = (15/8   9/4 )

0T4 = [ 9/4 11/4]     0T6 = [ 9 11]
1T4 = (11/4 13/4)     1T6 = (11 13)
2T4 = [13/4 15/4]     2T6 = [13 15]
3T4 = (15/4  9/2)     3T6 = (15 18)
0T5 = [ 9/2 11/2]     0T7 = [18 22]
1T5 = (11/2 13/2)     1T7 = (22 26)
2T5 = [13/2 15/2]     2T7 = [26 30]
3T5 = (15/2  9)       3T7 = (30 32)
```

Table of real intervals Arithmetic on intervals is relatively easy to define. For closed intervals, we can define addition and subtraction like this: [a b] + [c d] = [a+c b+d] [a b] - [c d] = [a-d b-c] Multiplication and division are a bit trickier: [a b] * [c d] = [min(ac, ad, bc, bd) max(ac, ad, bc, bd)] [a b] / [c d] = [min(a/c, a/d, b/c, b/d) max(a/c, a/d, b/c, b/d)] The formulae for open and semi-open intervals are similar. A quick check: 1 is in the interval (15/16 9/8). (15/16 9/8) + (15/16 9/8) = (15/8 9/4) 2 is in the interval (15/8 9/4), so 1 + 1 = 2. So far, so good, but we will soon discover that interval arithmetic has problems. The first hint of a problem is that the intervals are not of uniform size. The intervals at the low end of the line are much smaller than the intervals at the high end. Operations that produce answers much smaller than the inputs will have answers that span several intervals. For example, 20 - 14 20 is in the interval [18 22] 14 is in the interval [13 15] [18 22] - [13 15] = [3 9] We have 7 intervals that fall in that range, but none are large enough to cover the result. The second hint of a problem comes when multiply. It is true that 1+1=2, but does 1*2=2? 1 is in the interval (15/16 9/8) 2 is in the interval (15/8 9/4) (15/16 9/8) * (15/8 9/4) = (225/128 81/32) This is clearly not the same as (15/8 9/4). It seems that we have lost the multiplicative identity. An adherent of the interval model will have to come up with a way to map these arbitrary result intervals back into the floating-point numbers. In an upcoming post, I'll describe the rational model in more detail.

## Monday, June 18, 2007

If you have studied information theory, you might have noticed the question-bits procedure in the last post. It is the binary case ofthe Shannon Entropy. This makes sense: the entropy is the amount of information we can get out of the answer to a yes-or-no question when we have already been told the cluster. This also supplies us with a good idea of what makes a good cluster: the entropy after clustering should be as small as possible. That is, if the cluster is very homogeneous, then there aren't very many good yes-or-no questions that can narrow down your guess. Everything in the cluster has the same answer to almost all the questions. This would be a good clustering. If the cluster is very heterogeneous, then you'd expect a yes-or-no question to divide the cluster further. This would be a poor clustering. As it turns out, other people have been down this path: Barbara, Couto, and Li wrote a paper describing `COOLCAT: An entropy-based algorithm for categorical clustering'. So I guess I gotta write some code to try this out....

## Friday, June 15, 2007

### A Short Digression into a Question of Bits

```(defun log2 (n)  "Log base 2" (/ (log n) (log 2)))
(defun question-bits (p)
"Given the probability that a question is true, return
the number of bits of information you can expect by
(let ((p~ (- 1.0 p)))
(+ (* p  (- (log2 p)))
(* p~ (- (log2 p~))))))

(question-bits .5) =>  1.0   ; a fifty-fifty question gives us one bit
(question-bits .25) =>  .81  ; a 1/4 question gives us 8/10 of a bit
(question-bits .75) =>  .81  ; same answer (of course)
(question-bits .1100278644383595) => .5
```

If we could only come up with questions that were true about 1/10th of the time, we'd have to play 40 questions. Back to clustering next....

4

## Thursday, June 7, 2007

### Domain-specific Languages in Lisp

``` (when (and (< time 20:00)
(timing-is-right))
```