Here's a 2x2 matrix:
[64919121 -159018721]
[41869520.5 -102558961]
We can multiply it by a 2 element vector like this:
(defun mxv (a b
c d
x
y
receiver)
(funcall receiver
(+ (* a x) (* b y))
(+ (* c x) (* d y))))
* (mxv 64919121 -159018721
41869520.5d0 -102558961
3
1
#'list)
(35738642 2.30496005d7)
Given a matrix and a result, we want to
find the 2 element vector that produces that result. To do this, we
compute the inverse of the matrix:
(defun m-inverse (a b
c d
receiver)
(let ((det (- (* a d) (* b c))))
(funcall receiver
(/ d det) (/ (- b) det)
(/ (- c) det) (/ a det))))
and multiply the inverse
matrix by the result:
(defun solve (a b
c d
x
y
receiver)
(m-inverse a b
c d
(lambda (ia ib
ic id)
(mxv ia ib
ic id
x
y
receiver))))
So we can try this on our matrix
* (solve 64919121 -159018721
41869520.5d0 -102558961
1
0
#'list)
(1.02558961d8 4.18695205d7)
and we get the wrong answer.
What's the right answer?
* (solve 64919121 -159018721
(+ 41869520 1/2) -102558961
1
0
#'list)
(205117922 83739041)
If we use double precision floating point, we get
the wrong answer by a considerable margin.
I'm used to floating point calculations being off a little in the least
significant digits, and I've seen how the errors can accumulate in an iterative
calculation, but here we've lost all the significant digits in a straightforward
non-iterative calculation. Here's what happened: The determinant of our matrix is
computed by subtracting the product of the two diagonals. One
diagonal is (* 64919121 -102558961) =
-6658037598793281
, where the other diagonal is (* (+
41869520 1/2) -159018721) = -6658037598793280.5
This second
diagonal product cannot be represented in double precision floating
point, so it is rounded down to -6658037598793280. This is where
the error is introduced. An error of .5 in a quantity of
-6658037598793281 is small indeed, but we amplify this error when we
subtract out the other diagonal. We still have an absolute error of
.5, but now it occurs within a quantity of 1, which makes it
relatively huge. This is called “catastrophic cancellation”
because the subtraction “cancelled” all the significant digits
(the “catastrophe” is presumably the amplification of the error).
I don't care for the term “catastrophic cancellation”
because it places the blame on the operation of subtraction. But
the subtraction did nothing wrong. The difference betweeen
-6658037598793280 and -6658037598793281 is 1 and that is the result
we got. It was the rounding in the prior step that introduced an
incorrect value into the calculation. The subtraction just exposed
this and made it obvious.
One could be cynical and reject floating point operations as being too
unreliable. When we used exact rationals, we got the exactly correct result.
But rational numbers are much slower than floating point and they have a tendancy
to occupy larger and larger amounts of memory as the computation continues.
Floating point is fast and efficient, but you have to be careful when you use it.