Estimating the Poisson distribution

Question

I have a graph and I calculated the degree distribution and degree as follows:

library(igraph) # for these two functions

dd <- degree_distribution(graph) 
d <- degree(graph)

From this, I estimated to Power Law, to see if my distribution follows the "Law of Power":

degree = 1:max(d)
probability = dd[-1]

nonzero.position = which(probability != 0)
probability = probability[nonzero.position]
degree = degree[nonzero.position]

reg = lm(log(probability) ~ log(degree))
cozf = coef(reg)

power.law.fit = function(x) exp(cozf[[1]] + cozf[[2]] * log(x))

From that, I plotted the points and power law using ggplot2. Resulting in the following image:

df <- data.frame(x = degree, y = probability)
  print(
      ggplot(df, aes(x,y,colour="Distribuição"))+
        geom_point(shape = 4) +
        stat_function(fun = power.law.fit, geom = "line", aes(colour="Power Law"))+

        labs(title = "Grafo", subtitle = "Distribuição dos Graus",
             x="K", y="P(k)", colour="Legenda")+
        scale_color_brewer(palette="Dark2")
  )

PowerLaw and Data

As you can see, my distribution does not follow Power Law! I would like to estimate the Poisson distribution and plot on the same graph. Even though I'm not sure that my distribution does not follow (or follow) Poisson, I would like to draw it together with Power Law. I have no idea how to estimate this distribution (Poisson) from the data, and calculate the average degree.

Can anyone help me?

The graph used to calculate the distribution and the degree is very large (700 thousand vertices), so I did not put the data of the graphs. The explanation of the answer can be based on any graph.

You may wish to reference these answers from Cross Validated on fitting distributions to data in R: https://stats.stackexchange.com/questions/51470/finding-appropriate-distribution-that-fit-to-for-a-frequency-distribution-of-a-v, https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best — Z.Lin, Sep 01 '17 at 03:06
maybe relevant https://stackoverflow.com/questions/27071855/trying-to-fit-poisson-distribution-in-r-using-fitdistr-to-erdos-reyni-random-gra — user20650, Sep 01 '17 at 08:16

score 3 · Accepted Answer · answered Sep 01 '17 at 02:04

From ?dpois:

The Poisson distribution has density

p(x) = λ^x exp(-λ)/x!

for x = 0, 1, 2, … . The mean and variance are E(X) = Var(X) = λ.

So I'll generate some dummy data with a secret lambda:

mysecret <- ####

x <- data.frame(xes = rpois(50, mysecret))

> x$xes
 [1] 0 2 2 1 1 4 1 1 0 2 2 2 1 0 0 1 2 3 2 4 2 1 0 3 2 1 3 1 2 1 5 0 2 3 2 1 0 1 2 3 0 1 2 2 0 3 2 2 2 3


> mean(x$xes)
[1] 1.66
> var(x$xes)
[1] 1.371837

So two good guesses for my secret lambda are 1.66 and 1.37. Let's try them:

library(ggplot2)
ggplot(x, aes(xes)) + 
  geom_histogram(aes(y = ..density.., color = "Raw data"), 
                 fill = "white", binwidth = 1, center = 0, size = 1.5) +
  stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on E(X)"), 
               fun.args = list(lambda = 1.66), geom = "line", size = 1.5) +
  stat_summary(fun.y = dpois, aes(x = xes, y = xes, color = "Density based on Var(X)"), 
               fun.args = list(lambda = 1.37), geom = "line", size = 1.5)

They're both pretty good. You can't really use the built-in stat_function or geom_density for generating these, since Poisson distributions are only defined for integers. The histogram and summary functions work well though, since they're only estimated at the data points themselves, not interpolated.

If you want more detail, you can use the MASS package:

MASS::fitdistr(x$xes, dpois, start = list(lambda = 1))

    lambda  
  1.6601563 
 (0.1822258)

So let's try constructing from that:

library(dplyr)
df <- data_frame(xes = seq.int(max(x$xes)+1)-1,
                 dens.m = dpois(xes, 1.66),
                 dens.u = dpois(xes, 1.66+0.18),
                 dens.l = dpois(xes, 1.66-0.18))

> df
# A tibble: 6 x 4
    xes     dens.m     dens.u     dens.l
  <dbl>      <dbl>      <dbl>      <dbl>
1     0 0.19013898 0.15881743 0.22763769
2     1 0.31563071 0.29222406 0.33690378
3     2 0.26197349 0.26884614 0.24930880
4     3 0.14495866 0.16489230 0.12299234
5     4 0.06015785 0.07585046 0.04550717
6     5 0.01997240 0.02791297 0.01347012

ggplot(x, aes(xes)) + 
  geom_histogram(aes(y = ..density..), color = "black",
                 fill = "white", binwidth = 1, center = 0, size = 1.5) +
  geom_ribbon(data = df, aes(xes, ymin = dens.l, ymax = dens.u), fill = "grey50", alpha = 0.5) +
  geom_line(data = df, aes(xes, dens.m, color = "Based on E(X)\n+/-1 SD of lambda"), size = 1.5)

Based on these two methods and visual interpretation, you should feel comfortable saying λ = 1.66+/-0.18.

For reference, my secret initial value was 1.5.

If I'm understanding the question correctly, the original analysis looks at probability as a function of degree, i.e. a regression analysis with degree as a predictor - I don't think this kind of distribution-fitting incorporates predictors so it may not be suited to the same kind of regression analysis. — Marius, Sep 01 '17 at 02:24
@Marius I agree, but we were given no actual data to work with, so this is how to estimate a Poisson distribution (the stated question) not how to do regression on a Poisson GLM (possibly the desired outcome). — Brian, Sep 01 '17 at 02:44
This plot is the distribution of degrees. I would like to plot, as well as Barabasi (http://barabasi.com/networksciencebook/chapter/3#degree-distribution - image 3.4 and 3.5) — Fillipe, Sep 01 '17 at 02:47
Sorry. First > http://barabasi.com/networksciencebook, Section 3 (random network) and subsection 3.4(Image 3.4 and 3.5) — Fillipe, Sep 01 '17 at 02:54
I have the data (degree distribution). What I need is plot the poisson distribution, only! — Fillipe, Sep 01 '17 at 02:56
`degree(graph)`, your vector `d` is equivalent to the vector I made called `x$xes`. — Brian, Sep 01 '17 at 02:57
But, it's possible estimate the perfect λ ? Because you estimate λ = 1.5. There is any function, in R, that show me the perfect λ, based in my data? — Fillipe, Sep 01 '17 at 03:03
My initial 1.5 was just to generate some data, since you didn't give us any to work with. The methods I showed are how you estimate lambda from a univariate sample of integers. — Brian, Sep 01 '17 at 03:06

Estimating the Poisson distribution

1 Answers1