With inverse erf stretching, a cumulative normal distribution becomes a straight line.

It's pretty clear that my data set does not follow a lognormal distribution. Although the fit looked pretty good on the original histogram, it is simply not a straight line. However, the low end of the curve looks pretty straight.

This next plot is a Pareto distribution plotted with inverse erf stretching:

The Pareto distribution wasn't too bad a fit, but still, we can see it isn't a good match when we look at it here.

Another stretching function to consider is the inverse logistic function.

Now

*this*is interesting. A good chunk of the curve is now a straight line. It seems that a log-logistic distribution is a really good model for the data (at least for the long tail). Let's see how it looks on the original histogram.

In this graph I plotted the lognormal distribution that fit the low-end of the plot and the log-logistic distribution that fit the high-end.

So what is a log-logistic distribution? It seems to be common in sociology and biology. The survival curve after a kidney transplant follows a log-logistic distribution. It also seems to show up in insurance risk models. It is used in hydrology to model precipitation rates. This is weird. I can't see why this distribution would arise in what I'm measuring.

The log-logistic distribution has two tuning parameters, α and β. α is the median of the curve and β determines the amount of spread. In the stretched cumulative plot, α determines where the line intersects the 50th percentile and β determines the slope of the line.

## No comments:

Post a Comment