Tuesday, March 8, 2011

More about that puzzle

In the previous post I described a puzzle. I based the puzzle on a real-world problem: I'm invoking a service with an RPC and the defined interface requires that I poll for the answer. (I dislike polling, but I'm stuck with it.) I have been told vaguely that the answer usually arrives ‘within a few seconds,’ but that ‘sometimes it takes longer.’ (This is what passes for documentation. I can't blame the implementors too much, actually; they themselves are interfacing with a sucky third-party API. On the other hand, had they provided a better abstraction, then I wouldn't need to.)

So given this API, I decided to model it to try to understand it better. Casting the problem as a betting challenge helps because I can reduce the abstract cost and benefit to a simple number that I want to maximize. We can compare different strategies and different assumptions by comparing the “payoff”.

It makes sense to look at the trivial and degenerate strategies first as they are the simplest. The easiest strategy is to not do anything. That seems to be a non-starter here, but it rules out any strategy that loses money in the long run. (You'd be surprised at how often people think that doing something simply must be better than doing nothing. As Ira Gershwin penned, “It ain't necessarily so.”)

The next simplest strategy is to wait until you are sure the answer is available and poll exactly once. So how long is that? I asked the implementors of the API if there were any point at which we absolutely knew that the answer was available. They said there was no a priori time limit on getting an answer, but that they've never seen one take more than a few days. That's discouraging because it seems to rule out any sort of guaranteed “real-time” interaction. But since ‘most of the time’ it takes only ‘a few seconds’, it might be worthwhile to partition the problem into fast and slow responses and optimize the fast case.

The third simplest strategy is poll as rapidly as possible until the answer is ready. This will minimize the amount of time we spend in anticipation of a result, but the cost associated with polling may be too expensive (and remember, it costs nothing to do nothing).

Finally, there may be an optimal way to schedule the polling. Consider the case if we knew for certain that 97% of the time the answer comes back in exactly 2 seconds and that the other 3% of the time it took exactly 115 seconds. Obviously we'd poll exactly twice: once at just after two seconds, once again just after 115. Of course this is unlikely to be the case, but a bi-modal distribution of response times is not unusual in and of itself.

We can now characterize something about the solution: if an optimal strategy exists, it depends on several factors:
  • The cost of polling.
  • The benefit of getting a ‘timely’ answer.
  • The distribution of response times.
At this point I suspect that the distribution can be modeled as the sum of a handful of time-shifted log-normal distributions, but that is just a gut feeling.