Wednesday, January 28, 2009

ILC 09 Coming up

I had hoped to write a bit more about the Advanced Symbolic programming course, but ILC '09 is coming up soon and there is work to be done.

Friday, January 23, 2009

Propagator networks are a pain to program. You have to break up your problem into a set of computation nodes, allocate cells for intermediate values, and wire up the inputs and outputs. It's pretty straightforward to do this by hand for a small program, but it would be pretty nasty for something more complex. In addition, propagator networks for recursive functions require a bit of trickery to get them to work. On the other hand, the standard recursive descent evaluation strategy, while being a much easier model to program to, imposes implicit restrictions on evaluation order that get in the way of non-deterministic programming.

The solution is to compile our Scheme programs into a propagator network. This is what Chris Hanson showed to us on day 5. This is doable, but it is quite tricky. I don't think Chris had the time to explain all the nuances of the compilation, but if he had a difficult time making it work correctly, it can't be easy.

One thing that I thought was unusual was the way recursive procedures were compiled. The propagator network for a recursive procedure is not instantiated and linked until the procedure is actually invoked. This allows recursive procedures to run by instantiating only as much of the propagator graph as necessary to compute the answer. What strikes me as unusual about this is that a standard expression-based language usually prevents runaway recursion by making conditionals be non-strict. The propagator network seems to have the recursion grounded in a different place. I asked Chris is this was an issue and he thinks it may be what is causing some of the compilation difficulties.

Now that we have generated a propagator network, we need to run it. It is relatively straightforward to create a scheduling process that notes which value cells have changed and pushes the values through the appropriate computation nodes, but the performance is nothing to write home about. What is more interesting is to create a propagation-based virtual machine and run our propagator network against that. We can use JIT technology to accelerate the virtual machine and get reasonable performance. It may also be possible to partition the propagator network into independent or loosely coupled subnets. If so, then we can distribute the computation of the subnets among several physical machines.

This is where we wrapped up the course.

Next post, maybe I'll mention some ideas that occurred to me while I was listening.
Suppose we are writing a chess playing program where each move is computed by a non-deterministic process that was implemented as a search with backtracking. In fact, most chess playing programs are implemented this way. In the mid game, the program searches a tree of potential board positions by examining the possible moves one-by-one. Now it is not the case that your favorite chess program is written using `amb`. There is a hand-written loop that generates and tests various moves in various orders looking for a good series of moves that leads to a good position. But conceptually, you could use `amb` and it would work about the same (if slower).

Now as you search for a good move, you have a temporal constraint. You must move before your opponent can respond, and you must wait for your opponents move before you can make the next move. The computer simulates you and your opponent taking turns moving pieces and then looks carefully at the possible board positions at the end of the series of moves to determine which side is winning. If the computer is losing, it backtracks to one of its earlier moves and tries an alternative. But this method of backtracking has a big problem: every time you backtrack a move, you forget your opponents response. So suppose you find that if you move your knight, your opponent will kill you with a queen-rook combination 3 moves later. Aha! We'll backtrack to that knight move and try a bishop. And your opponent kills you with the same queen-rook combination 3 moves later. Aha, we'll backtrack to that bishop and try a pawn. (you know what happens). The problem is that your opponent's queen-rook combination is independent of the knight, bishop or pawn moves on your part, but each time you backtrack you forget this and rediscover it. What you really want to do is see that the queen-rook combination in three moves is a threat and figure out a move *now* that affects that outcome. How do you backtrack without forgetting?

Return to the multiple-dwelling puzzle. If you tried to re-order the clauses that generate possible solutions, you'd notice that there is no optimal ordering. Optimizing for one set of constraints makes another one less optimal and you cannot satisfy all at once. Sussman pointed out that this is a direct consequence of the underlying evaluation strategy. Nearly every computer language uses a ‘recursive descent’ evaluation model. (Even if the language is compiled, the various subexpressions are run in a recursive descent order.) To a certain extent, this is an artifact of the abstract syntax being a tree. Each node of the abstract syntax tree represents a language construct that can contain zero or more subnodes (subexpressions), but has exactly one parent node (the containing expression). It is natural to interpret the language through a tree-walk, and thus we get a recursive descent evaluator.

In fact, it is so natural, it is really hard to imagine something different. But an example of a different evaluation mode might be the way Scheme's `syntax-rules` hygienic macro system works. Rather than recursively evaluation of the subforms of an expression, macro expansion involves a pattern-match against the entire expression followed by a rewrite of the form. (Then comes a recursive descent phase when we expand the subforms.)

A recursive descent evaluator uses the stack to keep track of the control state of our program. When we backtrack to a decision point, we return to a prior control point by discarding the intermediate state. We are changing the control flow of the program. The unwanted side effect of this is that we are also changing the data flow of our program. The values that are returned from subexpressions are computed as we evaluate the subexpressions, and if we back out of the evaluation, we forget the values. But what if we could separate the control flow from the data flow? We cannot completely divorce the two because computing a value usually means transferring control to the expression that denotes the value, but we need not forget the value once it is computed if we back out of a computation.

Here's how we do it: We take our original Scheme program (like the multiple dwelling problem) and generate a propagator network that computes the same value. `AMB` nodes in the propagator network arbitrarily pick one of the values to emit. As suggested earlier, we'll propagate information about the values (the provenance) along with the values themselves. At the `require` nodes, we check to see if we have satisfied the requirement. If we have, everything is ok, but if not, we examine the provenances of the values that the requirement depends upon. The relevent `AMB` nodes will be in the provenances, and we can select one or more of those nodes to request a different value. This is backtracking, but it only backtracks the control directly involved in the data flow that supplies affects the requirement. Other parts of the computation remain unaffected. This is true even if the other parts of the computation were orginally syntactically intertwined with the part we are backing out of. This is caled dependency-directed backtracking.

We can extend this idea to be more sophisticated. Rather than backtracking to an `AMB` node and telling it to blindly choose a different alternative, we may be able to give it some hints to select an alternative in a more intelligent way. For instance, if we have deduced that Cooper lives on the first floor, but find we have to recompute Fletcher (who is not on the top or bottom, nor next to Cooper), we can use the constraints to immediately put Fletcher on floor 3 rather than cycling through all the possibilities.

It took a bit of time to get to this point. Propagator networks, McCarthy's `AMB`, recursive descent evaluation, and backtracking are all old ideas, so it was easy to listen to this part of the seminar and more or less consider it just a review. But while the parts may be familiar, this way of combining them is new. In the next post, I'll review day 5, where we put a bit more thought into building programs this way.

Thursday, January 22, 2009

Day 4, Part 2

The next part of the talk was given by Alexey Radul, a graduate student of Gerry's. Alexey described propagator networks. The idea is quite simple: you create a network that is built out of two kinds of parts. One kind of part is a stateless computational node. Computation nodes can have arbitrary inputs and outputs, but the outputs are a pure function of the inputs. (This doesn't mean that the guts of the node can't have state, just that any such state cannot be observed from examination of the outputs.) The second kind of part is a stateful value cell. Cells do no computation at all, but they save the results of the computational nodes. You compute things by creating a network that is specialized to the problem you are trying to solve.

Computation networks like this are not a new idea. The first spreadsheet programs developed in the late 60s and early 70s are computation networks. (Spreadsheets have a lot of structure that is an artifact of the original paper model. For example, the physical layout of rows and columns of cells is very strongly built in to the computation model.) Despite their age, propagation networks are still a model that is worth exploring. Gerry and Alexey described some interesting ideas.

First, Gerry and Alexey have narrowed their exploration to unidirectional propagation. Each computation node only knows how to compute the output from the input. This is different from a constraint network where a computation node may be able to ‘run backwards’ and deduce an input value given the output. Constraints are trivially added by simply making a new propagator that runs from the output back to the input. That is, the computation nodes are unidirectional, but the network topology can have circular connections and can therefore mimic bi-directional propagation.

The ability to create arbitrary connections is the reason we want to use a propagator network, but you are immediately faced with the key problem of such a network: what do you do if you compute a value, but the receiving cell already has a value? There are a few potential solutions to this problem:
1. Make it impossible. Require that the computation network is strictly tree-like. This is boring.
2. Make it illegal. Raise an error if the network ever attempts to modify a value cell that already has a value.
• Make it illegal unless it is benign. Raise an error if the network ever attempts to *change* a value in a cell. Attempts to re-write the existing value are ignored. This raises the question of what might be considered equivalent. At one end of the spectrum, the new value might be required to EQ to the existing one to be considered benign. At the other end, maybe they both match the same pattern.
3. Ignore it. Discard the new value.
4. Allow it. Discard the old value. Continue propagating the new value to any computation nodes connected to the cell.
5. Do something else (call a function) involving both the old and new value.
Option 5 subsumes options 2, 2a, 3 and 4, if you choose function apropriately, but it offers a couple of more interesting alternatives. First, perhaps the values you are computing have potential errors. You may have obtained them from some physical process. Instead of using a best-estimate for each computation, you may decide to use interval arithmetic. If a cell is recomputed and the new interval overlaps the old, then you can perhaps compute a tighter interval.

This leads to this interesting idea: Rather than put values in the cells, we'll accumulate information about the values and put that in the cells. What sort of information? As the computation propagates through the network, we can tag each value and note where it came from and what computations were done to compute it. A value will come with a provenance that we can examine. Now suppose we find that we are about to assign a value to a cell that already has a value. We can look at the two provenances and see if there is reason to believe one more than the other. Or if the values agree, we can combine the provenances to show that the value is supported by multiple methods of computation.

Or perhaps you are a Bayesian and want each cell to represent a belief (probability distribution) about a value. Then the value we assign to the cell would be a distribution that has been updated by a new piece of evidence. Again, we could use the provenance idea to determine what evidence was important in determining a value.

Gerry and Alexey have a paper coming out soon called The Art of the Propagator. It goes much further than I did here and is much clearer, so I suggest reading it as soon as it appears.

Ok, keep that idea in the back of your mind while we revisit the problem of `amb` and backtracking.

Wednesday, January 21, 2009

Day 4, Part 1

I've got to stop thinking that I must complete, polish and edit before posting to a blog. The reason I started one is so I could publish un-finished things before I forgot them.

On Thursday we covered something quite a bit more interesting in Gerry's lecture. First, though, I'm going to discuss the `AMB` operator in more detail. Here's a simple puzzle that is trivially solved via appropriate use of `AMB`
```
Baker, Cooper, Fletcher, Miller, and Smith live on different
floors of an apartment house that contains only five floors. Baker
does not live on the top floor. Cooper does not live on the bottom
floor. Fletcher does not live on either the top or the bottom
floor. Miller lives on a higher floor than does Cooper. Smith does not
live on a floor adjacent to Fletcher's. Fletcher does not live on a
floor adjacent to Cooper's. Where does everyone live?

(define (multiple-dwelling)
(let ((baker  (amb 1 2 3 4 5))
(cooper (amb 1 2 3 4 5))
(fletcher  (amb 1 2 3 4 5))
(miller (amb 1 2 3 4 5))
(smith  (amb 1 2 3 4 5)))
;; They live on different floors.
(require
(distinct? (list baker cooper fletcher miller smith)))

;; Baker does not live on the top floor.
(require (not (= baker 5)))

;; Cooper does not live on the bottom floor.
(require (not (= cooper 1)))

;; Fletcher does not live on either the top
;; or the bottom floor.
(require (not (= fletcher 5)))
(require (not (= fletcher 1)))

;; Miller lives on a higher floor than does Cooper.
(require (> miller cooper))

;; Smith does not live on a floor adjacent to Fletcher's.
(require (not (= (abs (- smith fletch)) 1)))

;; Fletcher does not live on a floor adjacent to Cooper's.
(require (not (= (abs (- fletcher cooper)) 1)))

(list (list 'baker baker)
(list 'cooper cooper)
(list 'fletcher fletcher)
(list 'miller miller)
(list 'smith smith))))
```
Even if this seems familiar to you, don't let your eyes glaze over, this is important.

There are a few ways to look at what the `AMB` operator does. The first is very abstract: `AMB` splits the interpreter into several copies, each copy attempts to proceed in parallel with one of the values, and if any interpreter computes an answer, we'll use it. This is the essence of non-deterministic programming: we try all the possibilities at the same time and arbitrarily pick a successful one.

Of course we don't implement the `AMB` operator that way. We certainly can't split a physical computer into several copies, and although we might be able to do that with virtual machines, the amount of memory and CPU usage would be way too high. Instead we look at the `AMB` operator in this way: `AMB` makes a snapshot of the state of the interpreter and then arbitrarily chooses one of the possible values to proceed with. If the computation doesn't fail, we can discard the snapshot, but if it does fail, we restore the snapshot and pick a different possible value. In this case, we are simulating non-deterministic programming by sequential iterative testing.

The simulation differs from the abstract model in a few important ways, some of these differences (and especially the consequences of these differences) are not obvious. The first obvious benefit is that by sequentially testing solutions, we limit our peak resource use. The tradeoff here is that we have to serially search through the solution space, so we have traded space for time. This is obvious because it is the tradeoff we desired. A less obvious benefit is this: if we don't use every solution to the problem, there is no need to exhaustively test every possibility. If we stop at the first solution, we are likely to use less time than if we try to enumerate all the solutions. So by testing sequentially we have traded space for *less* time. Another less obvious benefit is that instead of having just `solutions' and `failures', we can have a continuum of `great solutions', `good solutions', `poor solutions', `terrible solutions', etc. We can now dynamically decide how much time we wish to trade for a solution of sufficient quality. We may have to wait a long time to find the best solution, but we find an acceptable solution quite quickly.

These tradeoffs depend directly on how we decide to order the sequential testing. Refer to the multiple-dwelling problem. If `AMB` is implemented by sequential left-to-right trials, then we'll first try every combination that puts Baker on the first floor. Once we'e exhausted those, we try every combination that puts Baker on the second floor. Then we'll check the third, etc. It should be obvious that we can find a solution much faster if we can eliminate some of the problem constraints as quickly as possible. For example, we could trim the `AMB` forms to skip some combinations before we even run the program. Why does Fletcher have `(amb 1 2 3 4 5)` when we already know that possibility 1 and 5 cannot lead to a solution? Another thing we can do is re-order the `AMB` clauses and evaluate the `require` clauses sooner. We know that Fletcher cannot live next to Cooper, so we can check that reqirement before we even assign a floor to Miller and Smith and save ourselves a huge amount of time.

What's going on here? Our implementation of `AMB` has imposed an implicit order on the search through our possible solution space. By re-arranging the `AMB` clauses and testing the restrictions intelligently, we direct the code to search the more likely places in our solution space first and to avoid parts of our search space that we know don't contain valid solutions. We're pretty clever, eh?

Well, no, not really. Since we are the ones doing the re-ordering of clauses, the computer is making us do the hard work. Sure it is reasonably easy for this problem, but suppose the search space was for a good chess move. Should we examine queen moves first, or knight moves? What requirements should be checked and when? If we try a particular ordering, can we re-order the search?

In our implementation of `AMB`, every time we fail to meet a requirement, we backtrack to the most recent `AMB` and try the next option. This is called depth-first chronological backtracking. It naturally falls out of the implementation. We can change the implementation slightly. Rather than `AMB` making a snapshot of the world and proceeding along the first suggested path through the search space, we'll have `AMB` prepare a number of snapshots (one for each value) and place them in a queue. Then whenever we reach a failure point, we'll restart by popping a suggested search path off the queue. If you think about this a bit, you'll come to the conclusion that we can now direct the order in which we seach the solution space by defining a policy on how we use the queue. If we use the queue as a LIFO stack, then we get depth-first chronological backtracking. If we use the queue as a FIFO, we get breadth-first backtracking. So now we can change how we traverse the possible solution space without re-ordering the code.

We're now a bit more clever, but how do we know which order is best? We don't. We can only guess. On the other hand, the computer has more information than we have. If we annotate how the choices are made by `amb` we can make the backtracking smarter. We could, for example, assign a score to each potential path that is waiting in the queue and when we backtrack, select the highest scoring path. If we find that some path cannot satisfy a requirement, perhaps we can check the pending computations in the queue and simply remove the ones that would also follow the same path.

Now I'm going to abandon this line of though for a little while. I'll return to it.

Next post: propagator networks.

Thursday, January 15, 2009

Again, we covered some more material that I'm passingly familiar with and I think a lot of Lisp and Scheme hackers would find familiar as well. The first thing was how to introduce implicit backtracking in a combinator-based interpreter. The first step is to curry `eval`.
```(define (eval expression environment)
((analyze expression) environment))
```
This allows us to separate the static analysis of the program from the running of the program with ‘real data’. `analyze` is really a compiler. Now all the subcases of `eval` will emit procedures that take a single argument of the environment and return the computed value.

Larceny uses something almost exactly like this in its interpreter. In fact, the Scheme interpreter I'm writing does this as well, although it is difficult to see that because of all the extra cruft to support tail recursion and first-class continuations.

By changing the structure of the interpreter combinators to continuation-passing-style it is reasonably easy to add the McCarthy `AMB` operator.

Gerry then went on to show that if you can grab the implicit continuation of the underlying language, you can embed `AMB` directly rather than layering it on in an isolated interpreter. He pointed out that Scheme was one of the few languages that supplied first-class continuations. (I did have to point out that by using the tricks in ‘Continuations from Generalized Stack Inspection’ you can embed first-class continuations in nearly any language, but of course the contortions you have to go through are pretty horrific.)

I'm not sure how much to write here. Since I've been steeping myself in this sort of thing for the past several years, most of this stuff seems fairly straightforward, if esoteric.

Wednesday, January 14, 2009

The term ad hoc is a Latin phrase meaning ‘for this purpose’. In 1967, Christopher Strachey used the term to differentiate the kind of polymorphism where a function acts differently on different types from the kind of polymorphism where a generalized function can be parameterized by a range of types. The generic functions of CLOS are a good example of ad hoc polymorphism, and Java ‘generics’ are an example of parameterized polymorphism.

The whole point of generic functions is to allow you to treat objects of different types in a uniform, abstract manner. Generic arithmetic is the motivating example: you want to add two numbers and you really don't care whether they are floating point, integers, rationals, bignums, or whatever, just add the things, ok? Both parametric and ad hoc polymorphism allow you to do this.

Ad hoc is often used to imply something beyond the Latin meaning of specific purpose. It can mean ‘jury-rigged’, ‘makeshift’, ‘spur of the moment’, or ‘not well thought out’. In Wadler and Blott's paper “How to make ad-hoc polymorphism less ad-hoc”, they state:
One widely accepted approach to parametric polymorphism is the Hindley/Milner type system... On the other hand, there is no widely accepted approach to ad hoc polymorphism, and so its name is doubly appropriate.
It is clear that Wadler and Blott are assuming a perjorative meaning.

But what is the problem with ad hoc polymorphism? The primary problem seems to be that unrestricted use of ad hoc polymorphism makes it impossible to fully type check at compile time and requires some sort of run time type dispatch.

Suffice it to say that I'm underwhelmed by this argument.

I have found that parametric polymorphism is a fine tool for the problems it is designed to solve, but ad hoc polymorphism can solve those problems, too. Furthermore, ad hoc polymorphism allows you to do things that are quite difficult to do with parametric polymorphism, for example, add methods to `null` or other built-in classes, or make universal constructors (compare `make-instance` to the plethora of Java ‘factory’ classes).

I like ad hoc polymorphism. It allows me to tailor a solution to exactly solve a specific problem, and so its name is singularly appropriate.
I'm afraid I have much less to report today. Gerry and Chris went over techniques used for pattern matching, parsing, and term rewriting. It is probably the case that the material was new for some of the people in the class, but die-hard Lisp and Scheme hackers will find this sort of thing to be quite familiar.

Gerry started with a combinatoric pattern matcher. The basic structure was that each sort of matching construct would return a procedure that performed the actual match. These procedures all have the same signature and return value conventions, so it is possible to mix and match them. Gerry used a single ‘success’ continuation and simply returned `#f` if the match failed. The higher-order matchers would propagate the `#f` from the lower order ones. Chris said he preferred to use both a ‘win’ and ‘lose’ continuation instead.

I almost forgot. Gerry made one of those comments that seem obvious at first, but actually have a lot of depth behind them. “Here we have a simple pattern matcher. Pattern matching is a generalization of equality.”

It's funny, but I've been programming for all these years and that connection never occurred to me. Now I know that I'm probably the only person on the planet that didn't notice this, and everyone is wondering how on earth I can get a programming job if I'm this obtuse, but I think this is a very interesting observation. It answers the question of why there are so many equality predicates in Lisp: consider one of the operands to be a pattern and the other to be an object to match against. The ‘pattern’ object doesn't have any way to indicate the intent of the match. Suppose I want to match any three element list that contains the symbols `(a b c)`, but you want to match the exact instance of a particular list that contains the symbols ```(a b c)```. The ‘pattern’ in these cases is identical, so we have to indicate the strictness of matching elsewhere — in the predicate. So I call ‘equal’ and you call ‘eq’.

That's a pretty nice answer, but now I have more questions. Why are `EQ`, `EQL`, `EQUAL`, `EQUALP`, etc. built in operations, but a generalized pattern matcher not built in? I use `EQ` a lot in my code and my mental model is that I'm testing for object identity. How about if I relax my mental model and think about my uses of `EQ` as special instances of pattern matching. Would I get any advantages? A generalized pattern matcher carries around a dictionary of pattern variables so you can match substructures against previously matched substructure. How come the equality predicates like `EQUAL` don't have this? What if they did? Some computer languages eschew extensional equality: in order to test equality, the objects in question have to have an ‘is equal’ method, otherwise you are out of luck. (Extensional equality means that I can take two arbitrary objects and determine their equality without calling any method on either object. Equality is an external property of the object. Intensional equality means that the object or the object class defines what it means for objects of that type to be considered equal. Equality is an internal property of the object.) If the language discourages or prohibits extensional equality, is that an artifact of discouraging or prohibiting extensional pattern matching in general? If so, isn't that a horrible error in design? If not, then why is `EQ` singled out?

I hear Otto asking “You eat a lot of acid, Miller, back in the hippie days?”

On an unrelated topic, Gerry and I argued a bit about floating point numbers. We violently agree that they are a good tool for certain use cases and a poor tool for others. He gave an example of what he thought was a problem, but I objected that his particular example was handled correctly be the IEEE spec. I've got what I think is a better example:
```
The odds against a meteor hitting your house are 182,138,880,000,000
to 1. (I read it online, so it must be true.)  What is the
probability of a meteor striking your house?

Answer 1:  Given the odds against something, compute the odds for it
by taking the reciprocal.  Convert this to a probability with this
formula:  p = o/(o + 1)

(odds->probability (/ 1.0 182138880000000))
5.490315961095151e-15

Answer 2:  Given the odds against something, compute the probability
against it with p = o/(o + 1).  The probability for it is simply 1
minus the probability against it.

(- 1.0 (odds->probability 182138880000000))
5.440092820663267e-15
```
Those answers differ in the second decimal place. Shouldn't they be the same? Which one is right? Or are they both wrong?

What happened is this: floating point numbers are not evenly distributed, and there are a lot more of them down near zero than there are near 1.0 This means our answers will be a lot more accurate if we do our computations near zero rather than near one. (Since there are more floats near zero, if we have to round, we don't have to go very far to find a nearby float. But if we are up near 1.0, we have to go much, much further to find a nearby float.)

The actual answer is 1/182138880000001, and the closest floating point number to that is 5.4903159610951515e-15 The first method produced a much more accurate answer, but neither answer is exactly right.

Monday, January 12, 2009

Gerry Sussman and Chris Hanson are giving a series of 5 whirlwind lectures on advanced symbolic programming. I'm sitting in and I figure I'll make some comments here (rather than distract from the class). My hope is that the lectures will be available on-line sometime.

The first part of today's lecture was about using generic functions. This was pretty straightforward stuff, but there were a couple of interesting details. First of all, the generic arithmetic was extended to handle functions: (f + g)(x) = f(x) + g(x) Then, with the addition of hyperreals to the numeric tower, it was possible to compute differentials. Finally, by adding symbolic quantities to the numeric tower, you could compute derivatives of functions.

The non-obvious thing to me was that you could compute derivatives symbolically by using symbolic hyperreals, and that you need to compute the differentials using a two-tuple. This is going to sound completely opaque to anyone reading this, so I'd better go into more detail.

Consider a rational number. One way of looking at it is as if it is simply a pair of integers with some weird rules for addition, subtraction, multiplication and division. Now consider a complex number. Again, it is simply a pair of numbers with some weird rules for addition, subtraction, multiplication and division. A hyperreal is the same sort of thing: a pair of numbers with some weird rules for addition, subtraction, multiplication and division.

Now suppose we re-define the primitive +, -, *, and /. We allow them to take symbols as arguments. When given a symbol, we return an s-expression result. Now if we have previously defined rationals as being a pair of numbers with weird rules for addition, subtraction, etc., we can get symbolic rationals automagically by allowing the primitives to take symbolic arguments. (The rules for rational arithmetic being based on the underlying arithmetic primitives.) With a couple of additional rules for constructing rationals from symbols, we're all set for symbolic rational arithmetic.

We can do a similar trick with complex numbers.

And we can do a similar trick with the hyperreals. But here's a more interesting trick: A hyperreal [a, b] has a standard part `a' and an infinitessimal part 'b'. So the symbolic hyperreal [x, 1] means `x + dx'. Now consider the function (lambda (a) (* a a a)), the cube function. If we apply it to the symbolic number x and the symbolic hyperreal [x,1], and subtract the two, f(x + dx) - f(x), and then extract the differential part, we have the derivative. And sure enough, if we evaluate this in the interpreter:
```(print-expression ((D (lambda (a) (* a a a))) 'x))
=> (* 3 (expt x 2))
```

Now that's cool.

I have some more thoughts on this, but I think they can wait for later in the week.

Tuesday, January 6, 2009

Updated my Scheme interpreter. The newest implementation is fairly snappy.

Added my clambda.scm macro to jrm-code-project/scheme It is a complete implementation of the Common Lisp lambda list functionality (optionals, rest args, keywords, allow other keys, etc.) written as a syntax-rules macro.