Goodness of fit test for power law distribution in R

Question

I have a network for which I fit into a power-law using igraph software:

plf = power.law.fit(degree_dist, impelementation = "plfit")

The plf variable now holds the following variables:

$continuous
[1] TRUE
$alpha
[1] 1.63975
$xmin
[1] 0.03
$logLik
[1] 4.037563
$KS.stat
[1] 0.1721117
$KS.p
[1] 0.9984284

The igraph manual explains these variables:

xmin = the lower bound for fitting the power-law
alpha =  the exponent of the fitted power-law distribution
logLik =  the log-likelihood of the fitted parameters
KS.stat =  the test statistic of a Kolmogorov-Smirnov test that compares the fitted  distribution with the input vector. Smaller scores denote better fit
KS.p = the p-value of the Kolmogorov-Smirnov test. Small p-values (less than 0.05) indicate that the test rejected the hypothesis that the original data could have been drawn from the fitted power-law distribution

I would like to do a "goodness of fit" test on this power law fit. But I am not sure how to do this, and although I found this question already asked in online forums, it usually remains unanswered.

I think one way to do this would be to do a chisq.test(x,y). One input parameter (say x) would be the degree_dist variable (the observed degree distribution of the network). The other input parameter (say y) would be the fitted power law equation, which is supposed to be of form P(k) = mk^a.

I am not sure whether this is a sound approach, and if so, I need advice on how to construct the fitted power law equation.

In case it helps, the degree_dist of my network was:

 0.00 0.73 0.11 0.05 0.02 0.02 0.03 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00        0.01 0.00 0.00 0.00 0.01

(These are frequencies that degrees of 0-21 occurred in the network. (For example, 73% of nodes has degree 1, 1% of nodes had degree 21).

********* EDIT *************

I am unsure whether it was a mistake above to use degree_dist to calculate plf. In case it is, I also ran the same function using the degrees from the 100 nodes in my network:

plf = power.law.fit(pure_deg, impelementation = "plfit")

where, pure_deg is:

  21  7  5  6 17  3  6  6  2  5  4  3  7  4  3  2  2  2  2  3  2  3  2  2  2  2  2  1  1  1  1  1  1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 1

This leads to output of:

$continuous
[1] FALSE
$alpha
[1] 2.362445
$xmin
[1] 1
$logLik
[1] -114.6303
$KS.stat
[1] 0.02293443
$KS.p
[1] 1

Kolmogorov-Smirnov is a goodness of fit test. The power.law.fit function estimates the parameters of the power law that best fits the specified distribution (in terms of KS test). You can use the KS.p value to know if the estimated distribution is significantly different from the specified one. So, I don't understand why you want to do another goodness of fit test on top of that? — Vincent Labatut, Feb 05 '14 at 05:24

score 4 · Answer 1 · edited Aug 08 '17 at 12:47

There is a package named powerRlaw in R by Colin Gillespie. This package is well documented and contains a lot of example to use each function. Very straightforward.

http://cran.r-project.org/web/packages/poweRlaw/

For example in R as the documentation said, the following code get data from the file full_path_of_file_name and estimate xmin and alpha and get p-value as proposed by Clauset and al. (2009)

library("poweRLaw")

words = read.table(<full_path_of_file_name>)
m_plwords = displ$new(words$V1)         # discrete power law fitting
est_plwords = estimate_xmin(m_plwords)  # get xmin and alpha

# here we have the goodness-of-fit test p-value
# as proposed by Clauset and al. (2009)
bs_p = bootstrap_p(m_plwords)

Goodness of fit test for power law distribution in R

1 Answers1