4

I have a graph and I calculated the degree distribution and degree as follows:

library(igraph) # for these two functions

dd <- degree_distribution(graph) 
d <- degree(graph)

From this, I estimated to Power Law, to see if my distribution follows the "Law of Power":

degree = 1:max(d)
probability = dd[-1]

nonzero.position = which(probability != 0)
probability = probability[nonzero.position]
degree = degree[nonzero.position]

reg = lm(log(probability) ~ log(degree))
cozf = coef(reg)

power.law.fit = function(x) exp(cozf[[1]] + cozf[[2]] * log(x))

From that, I plotted the points and power law using ggplot2. Resulting in the following image:

df <- data.frame(x = degree, y = probability)
  print(
      ggplot(df, aes(x,y,colour="Distribuição"))+
        geom_point(shape = 4) +
        stat_function(fun = power.law.fit, geom = "line", aes(colour="Power Law"))+

        labs(title = "Grafo", subtitle = "Distribuição dos Graus",
             x="K", y="P(k)", colour="Legenda")+
        scale_color_brewer(palette="Dark2")
  )

PowerLaw and Data

As you can see, my distribution does not follow Power Law! I would like to estimate the Poisson distribution and plot on the same graph. Even though I'm not sure that my distribution does not follow (or follow) Poisson, I would like to draw it together with Power Law. I have no idea how to estimate this distribution (Poisson) from the data, and calculate the average degree.

Can anyone help me?

The graph used to calculate the distribution and the degree is very large (700 thousand vertices), so I did not put the data of the graphs. The explanation of the answer can be based on any graph.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Fillipe
  • 143
  • 9
  • 1
    You may wish to reference these answers from Cross Validated on fitting distributions to data in R: https://stats.stackexchange.com/questions/51470/finding-appropriate-distribution-that-fit-to-for-a-frequency-distribution-of-a-v, https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best – Z.Lin Sep 01 '17 at 03:06
  • Good answer! I'll check out! – Fillipe Sep 01 '17 at 03:15
  • 1
    maybe relevant https://stackoverflow.com/questions/27071855/trying-to-fit-poisson-distribution-in-r-using-fitdistr-to-erdos-reyni-random-gra – user20650 Sep 01 '17 at 08:16

1 Answers1

3

From ?dpois:

The Poisson distribution has density

p(x) = λ^x exp(-λ)/x!

for x = 0, 1, 2, … . The mean and variance are E(X) = Var(X) = λ.

So I'll generate some dummy data with a secret lambda:

mysecret <- ####

x <- data.frame(xes = rpois(50, mysecret))
> x$xes
 [1] 0 2 2 1 1 4 1 1 0 2 2 2 1 0 0 1 2 3 2 4 2 1 0 3 2 1 3 1 2 1 5 0 2 3 2 1 0 1 2 3 0 1 2 2 0 3 2 2 2 3


> mean(x$xes)
[1] 1.66
> var(x$xes)
[1] 1.371837

So two good guesses for my secret lambda are 1.66 and 1.37. Let's try them:

library(ggplot2)
ggplot(x, aes(xes)) + 
  geom_histogram(aes(y = ..density.., color = "Raw data"), 
                 fill = "white", binwidth = 1, center = 0, size = 1.5) +
  stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on E(X)"), 
               fun.args = list(lambda = 1.66), geom = "line", size = 1.5) +
  stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on Var(X)"), 
               fun.args = list(lambda = 1.37), geom = "line", size = 1.5)

enter image description here

They're both pretty good. You can't really use the built-in stat_function or geom_density for generating these, since Poisson distributions are only defined for integers. The histogram and summary functions work well though, since they're only estimated at the data points themselves, not interpolated.

If you want more detail, you can use the MASS package:

MASS::fitdistr(x$xes, dpois, start = list(lambda = 1))
    lambda  
  1.6601563 
 (0.1822258)

So let's try constructing from that:

library(dplyr)
df <- data_frame(xes = seq.int(max(x$xes)+1)-1,
                 dens.m = dpois(xes, 1.66),
                 dens.u = dpois(xes, 1.66+0.18),
                 dens.l = dpois(xes, 1.66-0.18))
> df
# A tibble: 6 x 4
    xes     dens.m     dens.u     dens.l
  <dbl>      <dbl>      <dbl>      <dbl>
1     0 0.19013898 0.15881743 0.22763769
2     1 0.31563071 0.29222406 0.33690378
3     2 0.26197349 0.26884614 0.24930880
4     3 0.14495866 0.16489230 0.12299234
5     4 0.06015785 0.07585046 0.04550717
6     5 0.01997240 0.02791297 0.01347012
ggplot(x, aes(xes)) + 
  geom_histogram(aes(y = ..density..), color = "black",
                 fill = "white", binwidth = 1, center = 0, size = 1.5) +
  geom_ribbon(data = df, aes(xes, ymin = dens.l, ymax = dens.u), fill = "grey50", alpha = 0.5) +
  geom_line(data = df, aes(xes, dens.m, color = "Based on E(X)\n+/-1 SD of lambda"), size = 1.5)

enter image description here

Based on these two methods and visual interpretation, you should feel comfortable saying λ = 1.66+/-0.18.

For reference, my secret initial value was 1.5.

Brian
  • 7,900
  • 1
  • 27
  • 41
  • 1
    If I'm understanding the question correctly, the original analysis looks at probability as a function of degree, i.e. a regression analysis with degree as a predictor - I don't think this kind of distribution-fitting incorporates predictors so it may not be suited to the same kind of regression analysis. – Marius Sep 01 '17 at 02:24
  • @Marius I agree, but we were given no actual data to work with, so this is how to estimate a Poisson distribution (the stated question) not how to do regression on a Poisson GLM (possibly the desired outcome). – Brian Sep 01 '17 at 02:44
  • This plot is the distribution of degrees. I would like to plot, as well as Barabasi (http://barabasi.com/networksciencebook/chapter/3#degree-distribution - image 3.4 and 3.5) – Fillipe Sep 01 '17 at 02:47
  • @Fillipe Your link leads to "404 page not found". – Brian Sep 01 '17 at 02:48
  • Sorry. First > http://barabasi.com/networksciencebook, Section 3 (random network) and subsection 3.4(Image 3.4 and 3.5) – Fillipe Sep 01 '17 at 02:54
  • I have the data (degree distribution). What I need is plot the poisson distribution, only! – Fillipe Sep 01 '17 at 02:56
  • `degree(graph)`, your vector `d` is equivalent to the vector I made called `x$xes`. – Brian Sep 01 '17 at 02:57
  • But, it's possible estimate the perfect λ ? Because you estimate λ = 1.5. There is any function, in R, that show me the perfect λ, based in my data? – Fillipe Sep 01 '17 at 03:03
  • My initial 1.5 was just to generate some data, since you didn't give us any to work with. The methods I showed are how you estimate lambda from a univariate sample of integers. – Brian Sep 01 '17 at 03:06
  • Ok. I'll check out! – Fillipe Sep 01 '17 at 03:13