A one or two parameter model is what I'm looking for. There are a lot of one and two parameter models that look like my data. The main characteristic of most of these distributions is that they asymptotically approach zero over a long time.
Gamma:
.png)
Log-normal:
.png)
Pareto:
.png)
Weibull:
.png)
Eyeballing these is hard, too. So we need a couple of more tricks. My favorite trick of log scale doesn't work too well. It does show the tail of the curve nicely, but it also magnifies the variation. Since the data are sparse out at the tail, this makes the variation that much bigger.
.png)
Instead, I'm going to integrate over the distribution and normalize the values so the curve goes from 0 to 1.

Now this graph is obviously bad. The curve is squeezed in near the edges so you can't see it, and the asymptotes are so close you couldn't tell if something fit or not. We'll fix this in a sec. But take a look here and you'll see a number of graphs that display the data is this awful way.
Now I'm going to use my log scale trick.
.png)
This is quite a bit better. Now we can see important things like the median (at the .5 mark) and the 90th percentile (at the .9 mark). There is another benefit we got. The data in this graph is the unsmoothed data. If we zoom in on a small part of the graph, we can see that the ‘hair’ has turned into tiny stairsteps.
.png)
Although the hair was tall, it was very narrow, so it doesn't contribute much to the integral. (So my attempt to shave the hair off the data was pretty much a waste of time. Oh well.)
But the point of doing this was to make it easier to fit the models. The models usually have an integral form (cumulative distribution) so we can try them out. But since we're using the integral form, the errors add up and we can more easily see which distributions have a better fit.
Cumulative Cauchy:
.png)
Cumulative log Pareto:
.png)
Cumulative lognormal:
.png)
Cumulative log Poisson:
.png)
There's one more thing to do....
No comments:
Post a Comment