Wednesday, March 9, 2011

Pretty charts

I have collected some empirical data concerning the polling problem. If we sort the successful polls by time and plot the ordinal fraction against time we get this:
If we plot on a logarithmic scale, it looks like this:
This looks suspiciously familiar. My gut feeling was that there was a log-normal distribution somewhere around here. If we try fitting a cumulative log-normal distribution to the data, we get this:

(μ = 4.3, σ = 1.6)
That is a very good fit for the range of interest.

Last night I had an ‘Aha!’ moment. We can look at the distribution as the probability that a result will be ready when we poll (that's not the ‘Aha!’). But we can pretend that we made all the RPCs at exactly the same time. If we had, then this chart shows the fraction of RPCs that finish as a function of time (not a big ‘Aha!’, but I'll take it). So if we were to poll at 2 seconds, we'd find about 20% of the RPCs have completed, but if we wait until about 4.5 seconds we'd find 90% have completed.

What if we polled at 2 seconds and 4.5 seconds? Well, the 20% that completed before the poll at 2 seconds will be serviced right after that poll, but any that become ready between 2 and 4.5 seconds will have to wait until the later poll. That will be 90 - 20 = 70%. That is to say, the efficacy of the poll — the marginal increase in likelihood of getting a response — is proportional to the vertical difference in the graph divided by the time since the last poll. Or, to put it yet another way, the polling frequency should be proportional to the derivative of the distribution. Aha!

An empirical measurement of the derivative would have a lot of noise. Fortunately, we noticed that the cumulative distribution is closely modeled by a cumulative log-normal curve. Cumulative means integral. The derivative is therefore simply a log-normal curve centered at the median time (about 3 seconds). (Is this an ‘Aha!’, or a ‘D'oh!’ moment?)

I think I have enough of an understanding of this problem to write some code.