Friday, April 8, 2011

Too much exercise

I've spent the past few days thinking about how to answer the exercises in my previous post. I think I finally came up with a good analogy. (Bear with me.)

These days, it is not uncommon to run programs on virtual hardware. Programs like VMWare or Virtual Box provide nearly undetectable emulation of an x86 PC. In fact, modern x86 CPUs actually emulate the x86 instruction set through hardware accelerated interpretation or jit compiling. Although it can be tricky to emulate the entire body of computer hardware and peripherals that might go into a PC, it is straightforward to emulate the bare bones PC hardware with close to perfect fidelity, even down to the CPUID.

So could one write a portable x86 program — one that can run without modification on a real PC with real hardware or on a real PC with virtual hardware (like a Transmeta CPU) or on completely virtual hardware — that can reliably and predictably detect whether it is being “run directly” or “just being emulated”? The entire raison d'être of virtualization rests on the fact that, given enough effort, you can fool all of the programs all of the time. (Bugs notwithstanding.)

So let's reconsider Exercise 1:  Write a program that returns 1 if run on any tail-recursive implementation of a language, but returns 0 if run on any non tail-recursive implementation of that same language.

Restate this as: Write a portable x86 program — one that can run without modification on a real PC with real hardware or on a real PC with virtual hardware (like a Transmeta CPU) or on completely virtual hardware — that can NOT ONLY reliably and predictably detect whether it is being “run directly” or “just being emulated”, BUT FURTHERMORE can reliably and predictably detect whether the underlying execution mechanism is “recursive” or “iterative”?

I assert that this question is barely worth thinking about (unless you work at VMWare) and that the answer could hardly be anything but “No.”

Exercise 1a (extra credit):  Write a program that crashes or doesn't return if run on a any tail-recursive implementation of a language, but returns 0 if run on a any non tail-recursive implementation of that same language.

If a program cannot detect whether or not it is being emulated, perhaps it can “half” detect. If it could reliably crash the emulator, then you could at least demonstrate that some sort of emulation is being done even if you couldn't perform a conditional branch in this situation. Suppose the program were being emulated. Suppose further that the emulator is recursive (for whatever reason). One could, in theory, crash the emulator via a stack overflow or out-of-memory error (maybe that emulator is recursive, but it uses heap-allocated stack frames). Alas, this isn't what we asked for. We are looking for a program that can reliably crash an iterative emulator. We're out of luck.

Exercise 1b, essay (extra extra credit): Discuss the implications of your answers to exercises 1 and 1a to the semantics of the programs (woah, my apologies, programs don't have semantics, languages do. Pretend I wrote “languages”). In particular, briefly outline what changes to the various semantic models — denotational, operational, and/or axiomatic — take place upon introducing tail recursion.

The implication is this: the semantics of a programming language are logically independent of the implementation of the language. (If they were not, virtual machines would be impossible.)

Exercise 2: What trivial change can Louis make to his code for smaller that will disable tail recursion?
(define (smallest list predicate)
  ;; Returns the smallest element of list.
  (if (null? list)
      #f
      (let ((answer
        (if (predicate (car list) (cadr list))
            (smallest (cons (car list) (cddr list)) predicate)
            (smallest (cdr list) predicate))))
        answer)))
There is no need for a flag to enable or disable tail recursion. Tail recursion only “works” when the caller does nothing to the return value but immediately return it itself. A LET expression is syntactic sugar:
(define (smallest list predicate)
  ;; Returns the smallest element of list.
  (if (null? list)
      #f
      ((lambda (answer) answer)
        (if (predicate (car list) (cadr list))
            (smallest (cons (car list) (cddr list)) predicate)
            (smallest (cdr list) predicate)))))
You can now see that the value of the conditional expression is not immediately returned, but rather passed as an argument to a (rather trivial) lambda expression. (Yes, a smart compiler could notice that the lambda expression is simply the identity function in this case. If that is a problem, simply make sure that the function you use to disable the tail recursion is late-bound so that the compiler cannot prove that it can be elided.)

Exercise 3: What trivial change can Louis make to his code for smaller that will enable tail recursion on all but the initial call to smaller?
(define (smallest list predicate)
  ;; Returns the smallest element of list.

  (define (smallest list predicate)
    (if (null? list)
        #f
        (if (predicate (car list) (cadr list))
            (smallest (cons (car list) (cddr list)) predicate)
            (smallest (cdr list) predicate))))
  (let ((answer (smallest list predicate)))
    answer))
Again, we don't need special flags to have fine control over tail recursion.

Exercise 4: Implement disable-tail-recursion as a macro.

;;; For example:
(define-syntax disable-tail-recursion
  (syntax-rules ()
    ((disable-tail-recursion form)
     (let ((answer form)) answer))))
You could be more sophisticated and invoke a late-bound function, but this is, frankly, overkill. You hardly need macros, special forms, flags, decorations, etc. to control tail recursion. This is especially true if all you want is a pretty stack trace for debugging. Tail recursion is as easy to turn on and off as a printf.

10 comments:

gwern said...

Perfect detection of substrates is obviously impossible; it's a non-trivial predicate which means it almost surely falls prey to Rice's theorem.

In practice, there are a bunch of techniques thanks to computer security concerns. (Unsurprisingly, researchers prefer to run worms & viruses in a virtual machine; even more unsurprisingly, the authors of said programs would prefer them not be runnable, much like they try to thwart using debuggers.) Some random links: http://weblogs.asp.net/jgalloway/archive/2006/10/27/Can-Operating-Systems-tell-if-they_2700_re-running-in-a-Virtual-Machine_3F00_.aspx & handlers.sans.org/tliston/ThwartingVMDetection_Liston_Skoudis.pdf & http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=5CB300F3BB2543B2B5AC460A838672B5?doi=10.1.1.99.3879&rep=rep1&type=pdf

Aaron said...

What prevents an implementation from optimizing "((lambda (x) x) y)" into "y", thereby undo-ing your attempts to prevent TCO?

Joe Marshall said...

Aaron asked...
What prevents an implementation from optimizing "((lambda (x) x) y)" into "y", thereby undo-ing your attempts to prevent TCO?

Nothing. That is why I mentioned that you might want to late-bind the procedure:

(define *tail-call-preventer* #f)
(set! *tail-call-preventer* identity-procedure)

then the code becomes

(*tail-call-preventer* y)

and the compiler cannot optimize this because it cannot prove that the value of *tail-call-preventer* is always bound to the identity procedure.

However, we can make life a tad easier for developers by turning off aggressive optimization when in debug mode (as is usually done).

Pascal Costanza said...

Hi,

Very nice post. This was actually very enlightening. Thanks!

Pascal

kbob said...

You insist on a "portable" program. But portable programs, by definition, are oblivious to the details of their environment's implementation.

Although it doesn't come up as often as things like byte order and word size, one criterion for portability would be that a program behave identically in implementations that optimize tail calls and those that don't.

So you're asking for a portable program that is nonportable. (-:

Keith Adams said...

I worked on VMware's virtual machine monitor (VMM) from 2000 to 2009.

Your assumption that a VMM must be undetectable to be correct is fallacious. It's the equivalent to saying AMD processors must pretend to be Intel processors to be correct. There's some intuitive force to this idea, but on reflection, requiring this would basically destroy the architecture/implementation distinction.

VMMs are invariably detectable, like almost all other aspects of the hardware and software stack on which an application finds itself. While you can imagine elaborate heroics to avoid detection, the arms race is tilted in favor of the would-be detectors. Which, as I argue above, is fine. In fact, at VMware we went to special effort to architect a stable API for detecting and versioning our VMM.


I've written about this in a HotOS submission (http://www.usenix.org/events/hotos07/tech/full_papers/garfinkel/garfinkel.pdf) and a blog post (http://x86vmm.blogspot.com/2007/07/bluepill-detection-in-two-easy-steps.html) if you're curious about further details.

wtanksley said...

I'm surprised about the logic here; it simply doesn't make sense. For example, of course you can crash an iterative implementation of a VM where a recursive implementation wouldn't crash: you just do an infinite tail-call loop. The iterative version will never terminate, the recursive version will run out of stack space and so will perform whatever the implementation does for a stack overflow. This depends only on your language defining semantics for running out of stack space, which it happens that Python (for example) does reliably. (I just had to depend on this for a Turing tarpit language I've written. No, you don't want to know why.)

You're being simply too abstract in your "heresies"; follow this line of thinking and there's absolutely nothing you can do on a computer.

Pascal Costanza said...

I can see one disadvantage of your suggested approach to prevent tail calls: For debugging purposes, I may want to disable tail calls for a number of functions, for example in one or more modules, or in a whole program. A macro for tail call prevention could help in principle, because I can choose between two different versions of the macro. But this would mean that I would have to mark all tail calls manually, "just in case" I want to debug them.

I think a Common Lisp-style declaration that allows me to turn off tail calls for some well-defined scope without marking all tail calls by hand would be better...

Faré said...

@Joe
Trying to "late bind" a function so it won't be optimized away only works if you assume a static compiler. A more elaborate JIT like PyPy can still see through this try.

@Pascal
I'm pretty sure you could define a relatively simple macro to do that in Racket, and a much more elaborate macro to do that in any Lisp dialect well-defined enough to be code-walkable.

Joe Marshall said...

Faré said...
Trying to "late bind" a function so it won't be optimized away only works if you assume a static compiler. A more elaborate JIT like PyPy can still see through this try.

That's ok. I'm not trying to win a war with a compiler that is determined to prevent stack walking. The author of an elaborate JIT ought to keep debugging in mind and offer some mechanism to tune the compiler. What I am suggesting is that if tail recursion is simply ‘turned on’ under more or less ‘normal’ conditions it is fairly trivial to restore enough of the backtrace should the missing frames be vital to debugging.

But even if you were working with the world's most aggressive JIT, a trivial insertion of a call to print (an external side effect) would force the JIT to compile a call as a ‘subproblem’ (non tail recursive) rather than a ‘reduction’ (tail recursive).

Tail recursion is about removing returns to stack frames that are logically unnecessary. `Disabling' tail recursion simply involves adding a logical necessity to return to the caller. Passing the return value to a different callee is virtually guaranteed to work and causes the least amount of disruption, but there are some unusual scenarios involving highly optimizing compilers where it conceivably might not. Performing an external side effect *is* guaranteed to work, but it is rarely necessary to take extraordinary steps just to get a backtrace; simply turn off the extreme optimization.