Here's a 2x2 matrix:

[64919121 -159018721] [41869520.5 -102558961]We can multiply it by a 2 element vector like this:

(defun mxv (a b c d x y receiver) (funcall receiver (+ (* a x) (* b y)) (+ (* c x) (* d y)))) * (mxv 64919121 -159018721 41869520.5d0 -102558961 3 1 #'list) (35738642 2.30496005d7)Given a matrix and a result, we want to find the 2 element vector that produces that result. To do this, we compute the inverse of the matrix:

(defun m-inverse (a b c d receiver) (let ((det (- (* a d) (* b c)))) (funcall receiver (/ d det) (/ (- b) det) (/ (- c) det) (/ a det))))and multiply the inverse matrix by the result:

(defun solve (a b c d x y receiver) (m-inverse a b c d (lambda (ia ib ic id) (mxv ia ib ic id x y receiver))))So we can try this on our matrix

* (solve 64919121 -159018721 41869520.5d0 -102558961 1 0 #'list) (1.02558961d8 4.18695205d7)and we get the wrong answer.

What's the right answer?

* (solve 64919121 -159018721 (+ 41869520 1/2) -102558961 1 0 #'list) (205117922 83739041)If we use double precision floating point, we get the wrong answer by a considerable margin.

I'm used to floating point calculations being off a little in the least
significant digits, and I've seen how the errors can accumulate in an iterative
calculation, but here we've lost all the significant digits in a straightforward
non-iterative calculation. Here's what happened: The determinant of our matrix is
computed by subtracting the product of the two diagonals. One
diagonal is ```
(* 64919121 -102558961) =
-6658037598793281
```

, where the other diagonal is ```
(* (+
41869520 1/2) -159018721) = -6658037598793280.5
```

This second
diagonal product cannot be represented in double precision floating
point, so it is rounded down to -6658037598793280. This is where
the error is introduced. An error of .5 in a quantity of
-6658037598793281 is small indeed, but we amplify this error when we
subtract out the other diagonal. We still have an absolute error of
.5, but now it occurs within a quantity of 1, which makes it
relatively huge. This is called “catastrophic cancellation”
because the subtraction “cancelled” all the significant digits
(the “catastrophe” is presumably the amplification of the error).

I don't care for the term “catastrophic cancellation” because it places the blame on the operation of subtraction. But the subtraction did nothing wrong. The difference betweeen -6658037598793280 and -6658037598793281 is 1 and that is the result we got. It was the rounding in the prior step that introduced an incorrect value into the calculation. The subtraction just exposed this and made it obvious.

One could be cynical and reject floating point operations as being too unreliable. When we used exact rationals, we got the exactly correct result. But rational numbers are much slower than floating point and they have a tendancy to occupy larger and larger amounts of memory as the computation continues. Floating point is fast and efficient, but you have to be careful when you use it.

## 2 comments:

Take an eigenvalue decomposition of the original 2x2 matrix. You will see that the smallest eigenvalue is well below the ~16 digits of relative precision of IEEE double floating point. Effectively singular in that representation so the result is as expected.

If you check back whether your answer obtained through rational arithmetic does indeed solve the problem, it doesn't - multiplying the coefficient matrix with the solution vector gives a product which is [0 0] and not [1 0] as required in the problem given by the rhs vector.

Post a Comment