.png)
With inverse erf stretching, a cumulative normal distribution becomes a straight line.
It's pretty clear that my data set does not follow a lognormal distribution. Although the fit looked pretty good on the original histogram, it is simply not a straight line. However, the low end of the curve looks pretty straight.
.png)
This next plot is a Pareto distribution plotted with inverse erf stretching:
.png)
The Pareto distribution wasn't too bad a fit, but still, we can see it isn't a good match when we look at it here.
Another stretching function to consider is the inverse logistic function.
.png)
Now this is interesting. A good chunk of the curve is now a straight line. It seems that a log-logistic distribution is a really good model for the data (at least for the long tail). Let's see how it looks on the original histogram.
.png)
In this graph I plotted the lognormal distribution that fit the low-end of the plot and the log-logistic distribution that fit the high-end.
So what is a log-logistic distribution? It seems to be common in sociology and biology. The survival curve after a kidney transplant follows a log-logistic distribution. It also seems to show up in insurance risk models. It is used in hydrology to model precipitation rates. This is weird. I can't see why this distribution would arise in what I'm measuring.
The log-logistic distribution has two tuning parameters, α and β. α is the median of the curve and β determines the amount of spread. In the stretched cumulative plot, α determines where the line intersects the 50th percentile and β determines the slope of the line.
No comments:
Post a Comment