Wednesday, July 16, 2025

Dual numbers

Let's build some numbers. We begin with the natural numbers, which capture the idea of magnitude. Adding them is simple: on a number line, it's a consistent shift to the right. But what about shifting left? To handle this, we could invent a separate flag to track direction and a set of rules for its manipulation. Or we can be clever and augment the numbers themselves. By incorporating a "sign," we create positive and negative integers, baking the concept of direction directly into our numerical system and embedding its logic into the rules of arithmetic.

This pattern of augmentation continues as we move from the number line to the number plane. Beyond simple shifts, we want to introduce the concept of rotation. Again, we could track rotation as an external property, but a more integrated solution emerges with complex numbers. By augmenting real numbers with a so-called "imaginary" unit, i, we create numbers of the form a + bi. If b is zero, we have a standard signed number. If b is non-zero, the number represents a rotation in the plane. A 90-degree counter-clockwise rotation is represented by i, and a clockwise rotation by -i. Notably, two 90-degree rotations result in a 180-degree turn, which is equivalent to flipping the number's sign. This geometric reality is captured by the algebraic rule i² = -1. Once again, we don't just track a new property; we weave it into the fabric of the numbers themselves.

Now, let us turn to the world of calculus and the concept of differentiation. When we analyze a function, we are often interested in its value and its slope at a given point. Following our established pattern, we could track the slope as a separate piece of information, calculating it with the familiar rules of derivatives. Or, we can be clever and augment our numbers once more, this time to contain the slope intrinsically. This is the innovation of dual numbers.

To do this, we introduce a new entity, ε (epsilon), and form numbers that look like a + bε. Here, a represents the number's value, and b will represent its associated slope or "infinitesimal" part. The defining characteristic of ε is unusual: we assert that ε is greater than zero, yet smaller than any positive real number. This makes ε an infinitesimal. Consequently, ε², being infinitesimally small squared, is so negligible that we simply define it as zero. This single rule, ε² = 0, is all we need. Our rules of arithmetic adapt seamlessly. Adding two dual numbers means adding their real and ε parts separately: (a + bε) + (c + dε) = (a + c) + (b + d)ε. Multiplication is just as straightforward, we distribute the terms and apply our new rule:

(a + bε)(c + dε) = ac + adε + bcε + bdε² = ac + (ad + bc)ε

Notice how the ε² term simply vanishes.

Extending the arithmetic to include division requires a method for finding the reciprocal of a dual number. We can derive this by adapting a technique similar to the one used for complex numbers: multiplying the numerator and denominator by the conjugate. The conjugate of a + bε is a - bε. To find the reciprocal of a + bε, we calculate 1 / (a + bε):

1 / (a + bε) = (1 / (a + bε)) * ((a - bε) / (a - bε))
= (a - bε) / (a² - abε + abε - b²ε²)
= (a - bε) / (a² - b²ε²)

Using the defining property that ε² = 0, the denominator simplifies to just a². The expression becomes:

(a - bε) / a² = 1/a - (b/a²)ε

Thus, the reciprocal is 1/a - (b/a²)ε, provided a is not zero. This allows for the division of two dual numbers by multiplying the first by the reciprocal of the second, completing the set of basic arithmetic operations.

But what is it good for? Based on the principles of Taylor series or linear approximation, for a very small change bε, a differentiable function's behavior can be described as:

F(a + bε) = F(a) + F'(a)bε

The result is another dual number. Its "real" part is F(a), the value of the function at a. Its "infinitesimal" part is F'(a)b, which contains the derivative of the function at a. If we set b=1 and simply evaluate F(a + ε), the ε part of the result is precisely the derivative, F'(a). This gives us a direct way to compute a derivative, as captured in this conceptual code:

(defun (derivative f)
  (lambda (x)
    (infinitesimal-part (f (+ x ε)))))

This method provides an alternative to traditional numerical differentiation. Standard finite-difference methods, such as calculating (F(x+h) - F(x))/h, force a difficult choice for h. A large h leads to truncation error from the approximation, while a very small h can introduce significant rounding error from subtracting two nearly identical floating-point numbers. Dual numbers sidestep this issue entirely. The process is algebraic, not approximative. The derivative is computed numerically, but exactly, with no truncation error and without the instability of manipulating a vanishingly small h.

By extending our number system to include an infinitesimal part, we have baked the logic of differentiation — specifically, the chain rule — into the rules of arithmetic. We no longer need a separate set of symbolic rules for finding derivatives. By simply executing a function with dual numbers as inputs, the derivative is calculated automatically, as a natural consequence of the algebra. Just as the sign captured direction and i captured rotation, ε captures the essence of a derivative

If we want to combine dual and complex numbers, we have a choice: dual numbers with complex standard and infinitesimal parts, or complex numbers with dual real and imaginary parts. From an implementation standpoint, the former is easier because complex numbers are already supported.

No comments: