So given this API, I decided to model it to try to understand it better. Casting the problem as a betting challenge helps because I can reduce the abstract cost and benefit to a simple number that I want to maximize. We can compare different strategies and different assumptions by comparing the “payoff”.
It makes sense to look at the trivial and degenerate strategies first as they are the simplest. The easiest strategy is to not do anything. That seems to be a non-starter here, but it rules out any strategy that loses money in the long run. (You'd be surprised at how often people think that doing something simply must be better than doing nothing. As Ira Gershwin penned, “It ain't necessarily so.”)
The next simplest strategy is to wait until you are sure the answer is available and poll exactly once. So how long is that? I asked the implementors of the API if there were any point at which we absolutely knew that the answer was available. They said there was no a priori time limit on getting an answer, but that they've never seen one take more than a few days. That's discouraging because it seems to rule out any sort of guaranteed “real-time” interaction. But since ‘most of the time’ it takes only ‘a few seconds’, it might be worthwhile to partition the problem into fast and slow responses and optimize the fast case.
The third simplest strategy is poll as rapidly as possible until the answer is ready. This will minimize the amount of time we spend in anticipation of a result, but the cost associated with polling may be too expensive (and remember, it costs nothing to do nothing).
Finally, there may be an optimal way to schedule the polling. Consider the case if we knew for certain that 97% of the time the answer comes back in exactly 2 seconds and that the other 3% of the time it took exactly 115 seconds. Obviously we'd poll exactly twice: once at just after two seconds, once again just after 115. Of course this is unlikely to be the case, but a bi-modal distribution of response times is not unusual in and of itself.
We can now characterize something about the solution: if an optimal strategy exists, it depends on several factors:
- The cost of polling.
- The benefit of getting a ‘timely’ answer.
- The distribution of response times.
 
 
2 comments:
If You know the distribution, getting the optimal payout is a simple dynamic programming problem.
the state would be (curent time, time of last poll).
The actions could be, for example:
- option 1:
-- wait N seconds and poll,
-- give up,
- option 2:
-- wait 1 second,
-- poll,
-- give up.
Note that, if you don't know the distribution, this can be modeled as a Reinforcement Learning problem (for which an ideal solution would optimally trade off between exploring to acquire more knowledge and exploiting to make more money).
Let me add that this interesting puzzle is very similar to the question:
"When should I open my oven to check whether the cake is cooked or not?"
(with the added difficulty that opening the oven may alter the cooking process)
Post a Comment