Monday, August 11, 2008

Interpreter Overhead

Consider this client/server model of computing. The server is a general purpose virtual machine/runtime system. The client is the program which makes ‘requests’ to the server and receives responses. And let's be completely abstract here. The client is capable of nothing but making requests. The server does all the work. If the client needs temporary storage for a variable, it requests that the server allocate it. If it wants to load a value into that variable, it requests that the server load the location. The client isn't much more than an instruction stream, but keep the image of an active, but extraordinarily primitive client in your mind.

The server, on the other hand, is amazingly powerful compared to the client, but totally ignorant. It simply does not know what to do or what it is doing. It just mechanically follows the requests of the client.

This is simply a description of the Harvard architecture, but I want to emphasize that each component --- the program (client) and the processor (server) --- sees the other as a black box. And although the processor is usually responsible for fetching instructions, I've made the client responsible for feeding them to the processor.

A general purpose processing machine combined with a program gives you a special purpose machine. Your x86 processor is general purpose, but Excel turns it into a spreadsheet processor. We can consider a program as a ‘meta-operator’ that transforms one machine into another. Instead of making requests directly to the machine, we make them to the program/machine combination. The program makes requests of the underlying machine and arranges for the underlying machine to deliver the answer in a useful format.

If you have a program written in a language that your machine doesn't natively understand you have two options to make it run. You can translate the program to a language the machine does understand, or you can translate the machine to a machine that understands the program. The first technique is compiling, the second is interpretation. An interpreter is a program that converts a machine in one language to a machine in another.

Interpretation almost always performs slower than compilation. The usual reason is given as ‘interpreter overhead’. But what is interpreter overhead? If we look at the model I described above, we have this situation if we have compiled our program:
                    compiler
           program ---------> program'

               +--------------------
               | program' | machine
               +--------------------
But if it is interpreted, we have this situation:
           +---------------------------------
           |         +-----------------------
           | program | interpreter | machine
           |         +-----------------------
           +---------------------------------
Obviously the extra layer introduced by interpretation is a problem. But in our model, the client doesn't do anything. The machine is ultimately the thing that takes action. The client is essentially passive. The only reason that things would take longer is because the machine is doing more work in the second situation.
           +---------------------------------------
           |         +-----------------------------
           | program | interpreter | machine
           |         |             |  + excess work
           |         +-----------------------------
           +---------------------------------------
The interpreter must be making more requests of the machine than the compiled program. (I never miss the obvious.) That's the interpreter overhead.

Let's turn this around. Now we're take the point of view of the machine and we are getting instructions fed to us. The picture is like this:
            -------------------+
                       program | machine
            -------------------+
or perhaps it is like this:
                     +-------------+
           ----------+             |
            program  | interpreter | machine
           ----------+             |
                     +-------------+
Another picture:
                     +--------------+
           ----------+              |
            program  | interpreter  |   machine
           ----------+              |
                     +--------------+
                          |    |
                     +--------------+
                     | co processor |
                     +--------------+
I've attached an ‘interpreter co-processor’ to the picture. What does it do? It handles all the ‘interpreter overhead’. If the interpreter needs some temporary storage to hold a variable, or if it has to maintain a lookup table to find subroutines, the co-processor is the machine that handles that. On the other hand, if the interpreter needs to perform an addition because the user's program calls ‘+’, this is given directly to the machine. The co-processor is only there to help the interpreter, not to perform any work requested by the program.

From the machine's point of view, the only way we can tell if two programs are different is if the series of instructions being fed to us is different. We don't really know, or even care what is going on outside. We're just following orders, whether they come directly from a program or via an interpreter (which, after all, is just another program as far as we're concerned), or whether the interpreter has a funny co-processor attached.

Suppose we have two programs, X and Y, that have only a small difference between them. They both feed the exact same instructions in the exact same order to the machine, but program Y uses the interpreter co-processor half as much as program X. Since the instruction stream to the machine is identical in both, it should be the case that both programs ‘do’ the same thing. The machine is the only component that can take action, and you've told it to take the same action in both cases. But program Y uses fewer resources than program X. By making fewer demands of the co-processor, perhaps our machine can be cheaper or run cooler. The instructions that are given to the co-processor are not essential to running the program.

Our machine is universal, so we really don't need to buy a separate co-processor. We can emulate it.
                     +--------------+
           ----------+              +--+---------+
            program  | interpreter  |  | machine |
           ----------+              +--+--+---+--+
                     +--------------+     |   |
                          |    |          |   |
                     +--------------------+   |
                     | co processor emulator  |
                     +------------------------+
How does this differ from the picture that looked like this?
                     +-------------+
           ----------+             |
            program  | interpreter | machine
           ----------+             |
                     +-------------+
It doesn't, except that I can point at the instructions coming through the ‘emulated co-processor’ and identify them as ‘interpreter overhead’. And as I mentioned just above, those instructions aren't essential to running the program.

If we are not trying to ‘optimize’ the program, then a compiled program should be identical to an interpreted one, but without as much ‘interpreter overhead’. The overhead comes from the fact that we are emulating some of the behavior that the interpreter relies on. Compilation wins because we remove as much of the need for the emulation as we can. But some machines these days are pretty sophisticated. Do we need to emulate the desired behavior, or can we co-opt the machine's built-in behavior to approximate the desired behavior?

4 comments:

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Unknown said...

Nice Post

I´m studing around Operative Systems and I wonder if actual processor does have embedder interpreters? Is that correct

Thanks

Joe Marshall said...

Yes. Many of the earlier processors in the late 70s and early 80s would have hardware that directly executed the instruction stream, but these days the processor translates the x86 instruction stream into something that more closely matches the internal hardware.