Tuesday, March 23, 2010

Pop quiz fallout

At the end of the two hours, Student B was still furiously scribbling away. He had accumulated a suprising amount of paper around himself. The professor came over to talk to him.
“What seems to be the problem?”
‘I don't have enough time!’
“It isn't that hard a quiz, you know. Every other student has finished. What exactly are you doing?”
‘This part, right here.’
Student B pointed to this line in the quiz:
(= (- (fib 100) (fib 99)) (fib 98))
‘I decided to evaluate this from right to left because you said that order doesn't matter. I'm still working on the (fib 98) subexpression. I think that (fib 99) and (fib 100) are going to take even longer. I can't see how anyone could get past this. I'm getting discouraged. I don't even think a computer could solve this in two hours.’

This is a key question: What, if anything, is Student B doing wrong?


mquander said...

Where did student B get the idea that he was supposed to evaluate it? His goal is to answer the questions on the quiz, not step through the code. If stepping through the code is the only way he can answer the questions, then he hasn't been taught anything useful in class.

gwern said...

Echo mquander. He should've realized that he's supposed to be smarter than a computer, and that he could just treat 'fib 98' as an unknown constant _x_, and then he can substitute _x_ into 'fib 100' and 'fib 99', and see whether it's an identity.

The numbers are big enough that it is obvious one is supposed to treat it symbolically rather than calculate it out by hand. I wouldn't blame someone for realizing that and then being unable to answer it symbolically; but I do blame them for not realizing that at all.

Buddha Buck said...

On the other hand, he has a point (one I missed when reading the pop quiz). If loaded into an R5RS system, the file wouldn't finish loading in a reasonable amount of time because of that line.

(fib n) is a common example used in classes to show why naive recursion isn't always the best solution, similar to how (factorial n) is used to show how iteration and tail-calls can be used to prevent stack limitations. If he didn't recognize the example, he either (a) wasn't paying attention (in which case he should fail), or (b) he hadn't been shown it yet (in which case he has gained a valuable insight).

I do agree that he didn't need to evaluate it to answer the quiz questions, and he wasn't asked to evaluate it.

John Cowan said...

Just to get some actual data, I stripped the code down to just the "fib" definition and the alleged sanity check. I then compiled this using the Chicken compiler with the numbers egg (which you need to get bignums). Stalin doesn't have bignums, so it was out of the picture; Gambit might have done slightly better.

Anyway, I killed the program after about 8.5 hours of running time on a 64-bit Dell two-core box which was in light use during that time. Of course the program was compiled to use only one core. I don't know how far it got.

Unknown said...

A "Sufficiently Smart Interpreter", which no Scheme implementation I know of is, but which the students in this class are expected to be, at least with respect to this quiz, could reason about that form as follows (I write out the reasoning in gory detail, but there is no particular reason why this needs to take a very long time to think, as opposed to explain):

0) I will say "returns normally" to mean "returns in finite time, without signalling an error or causing a side effect."

1) The symbol = is bound in the standard environment to a procedure I know. The return value of the expression in question is discarded. Therefore, I know that the subexpressions are evaluated according to the usual rules of evaluation, and, if they normally return values which will not cause = to signal an error, this form will have no effect when executed (except, perhaps, to warm the room for a while if it is executed faithfully). Ditto - in the first subform.

2) The symbol fib is (by now) bound to a compound procedure. Therefore 100, 99, and 98 are also evaluated, and yield some exact integers, unless preempted by fib exiting abnormally or failing to terminate. Let me examine the fib procedure.

3) I know the binding of the symbol if, so I can begin to reason about how the fib procedure behaves. If the fib procedure is called with an exact integer, (< n 2) will be evaluated in an environment whose bindings for < and 2 I know, and whose binding for n I know to be an exact integer. The subform (< n 2) will return a boolean normally. If that boolean is #t, fib will return an exact integer normally. If that boolean is #f, the if form will tail-call into (+ (fib (- n 1)) (fib (- n 2))).

4) Since + and fib are known, in this environment, to be procedures, (- n 1) and (- n 2) will be evaluated, with known bindings for -, 1, and 2, and with n an exact integer. Therefore, these forms will return exact integers normally. Therefore, if the recursive calls to fib return exact integers normally when given exact integers, (+ (fib (- n 1)) (fib (- n 2))) will return an exact integer normally, and the non-recursive call to fib will return an exact integer normally.

5) For exact negative integers n, (< n 2) will return #t and (fib n) will return an exact integer normally. For exact non-negative integers n, I can prove by induction that (fib n) will return an exact integer normally.

6) Therefore the subforms (fib 100), (fib 99), and (fib 98) will all return exact integers in finite time, and have no side effects, and in particular no side effects that have any effect on any of the other definitions present in this quiz. Therefore the form (= (- (fib 100) (fib 99)) (fib 98)) will return a boolean in finite time, without causing any errors or side effects. I now know everything I need to know about it with respect to this quiz, so I need not evaluate it or think about it further. I proceed with the quiz.

6') (And if I did want that expression's value, I could prove by induction that (fib n) is a function in the sense that it always returns the same answer when given the same exact integer argument, which would permit me to memoize it, and so to compute the values of (fib 100) (fib 99) and (fib 98) in O(100) arithmetic operations; or do a bit of algebraic reasoning on the relationship between sums and differences of exact integers, and conclude that the sanity check will return #t).

Student B's error was an excess of literal-mindedness. He was expected to ignore the evaluation path of the sanity check form, at most carrying out some reasoning such as the above to convince himself that it could safely be ignored.

Indeed, if the definition of the fib procedure were not available, each student would be forced simply to assume that evaluating that sanity check form would cause no trouble, since, in principle, the fib procedure could side-effect the binding of, say, zero?, and change the answers to the quiz. Student A would also have had a field day in that case.

The Donkey said...

I think student B's failure is in his unwillingness to understand why something is true. Basically, there are two points I want to make here:

a) If I understand why an assertion made by my professor is true, I usually do not resort to testing it. I only do that if I think there's a problem (e.g. a formula he gives me returns infinity if I plug in some perfectly realistic numbers), or if I do not immediately understand the reasoning behind it in the abstract and hope to get a clearer picture if I try to see how it works on actual data. In this particular case, I don't think it's very difficult to understand why the order of evaluation does not affect the final result.

b) On the other hand, being a first year course, I think it's reasonable to assume some people don't quite know even the most basic details of how the expression evaluation works. This is pardonable at this level. What is not pardonable is the fact that instead of trying it on a simple expression (e.g. on (= (- (fib 5) (fib 4) (fib 3)), which takes considerably fewer ice ages to scribble on paper), the guy jumped straight into doing the huge-ish example.

I don't think a) would show anything relevant in a first-year course when people don't have the big picture clearly in their minds. I think b) is more relevant because it shows a slight unwillingness to think abstract things thoroughly. The approach he is taking is obviously correct in terms of procedure (not as elegant as a proof by induction, but hey, it works, and if all you have in your math toolbox is a rock, you might as well use it as a hammer). On the other hand, it's obviously misguided because it cannot take too much ability of abstract reasoning to realize that you can perform it on smaller numbers without loss of generality.

Closer to a future programmer's set of issues, I think this also shows a problem with time management. I think the guy must have realized it is going to take ages to finish this before even starting, and yet he still jumped at it, obviously without considering any alternatives (30 seconds of weighing alternatives would have led him to trying on smaller numbers, really).

I think there are three things student B should learn:

1. Better handling of the mathematical language. For example, there's really no excuse for not using ellipsis points (i.e. "..." -- is that how they're called? English is not my native language) instead of scribbling the full expansion of an expression when it's obvious what it expands into after its first few terms.

2. Generality. He should be able to quickly assess the generality of a formula or algorithm by looking at its mathematical expression. The meaning of a program is symbolic in nature and cannot be adequately expressed through its results. The expansion of Fibonacci series is a perfect example of this: its meaning is in the formula, if you just look at its expansion, you don't "get it". If you want to see the logic behind them, you eventually have to get to its expression.

3. Testing your assumptions on limited models. If you want to test the expansion of series, you do it on something that takes less time, really.

Of #1, #2 and #3, I think #2 is essentially what student B is doing wrong. #1 and #3 are probably just the result of lack of practice.