Im attempting fitting a powerlaw distribution to a data set, using the method outlined by Aaron Clauset, Cosma Rohilla Shalizi and M.E.J. Newman in their paper "Power-Law Distributions in Empirical Data".
I've found code to compare to my own, but im a bit mystified where some of it comes from, the story thus far is,
to identify a suitable xmin for the powerlaw fit, we take each possible xmin fit a powerlaw to that data and then compute the corresponding exponet (a) then the KS statistic (D) for the fit and the observed data, then find the xmin that corresponds to the minimum of D. The KS statistic if computed as follows,
cx <- c(0:(n-1))/n # n is the sample size for the data >= xmin
cf <- 1-(xmin/z)^a # the cdf for a powerlaw z = x[x>=xmin]
D <- max(abs(cf-cx))
what i dont get is where cx comes for, surely we should be comparing the distance between the empirical distributions and the calculated distribution. something along the lines of:
cx = ecdf(sort(z))
cf <- 1-(xmin/z)^a
D <- max(abs(cf-cx(z)))
I think im just missing something very basic but please do correct me!