1

My goal is to find the point where scale-free networks become indistinguishable from random (non-scale-free) networks using the powerlaw python package

As stated in their paper one should determine the goodness of a power-law fit always by comparing it to the fit to another distribution.

I would expect something like the binomial distribution to be implemented for comparison of the goodness of fit but it's not.

For example I tried the following code to distinguish between an obviously scale-free network and an obviously non-scale-free network (both with similar numbers of nodes/edges):

non_sf_graph = nx.gnp_random_graph(10000, 0.002)
sf_graph = nx.barabasi_albert_graph(10000, 10)
fitpl = powerlaw.Fit(list(sf_graph.degree().values()))
fitnpl = powerlaw.Fit(list(non_sf_graph.degree().values()))

for dist in fitpl.supported_distributions.keys():
    print(dist)
    fitpl.distribution_compare('power_law', dist)
    fitnpl.distribution_compare('power_law', dist)

The output suggested that none of the implemented distributions provided a tool to discern between an preferential attachment model and a gnp random graph:

lognormal
(-0.23698971255249646, 0.089194415705275421)
(-20.320811335334504, 3.9097599268295484e-92)
exponential
(511.41420648854108, 7.3934851812182895e-23)
(24.215231521373582, 3.7251410948652104e-08)
truncated_power_law
(3.3213949937049847e-06, 0.99794356568650555)
(3.1510369047360598e-07, 0.99936659460444144)
stretched_exponential
(16.756797270053454, 1.6505119872120265e-05)
(8.7110005915424153, 8.7224098659112012e-05)
lognormal_positive
(30.428201968820289, 1.7275238929002278e-07)
(6.7992592335974233, 5.4945477823229749e-06)

(sign of first value indicates whether first (positive) or second (negative) distribution is a better fit, second value is the p-value for the significance of that decision)

Am I going at this problem from the wrong angle, or should I implement the binomial distribution myself?

I am asking as i am no statistics expert and I might not see the significance of all the available distributions. But they seem to fail this basic example.

David Schumann
  • 13,380
  • 9
  • 75
  • 96

0 Answers0