11

As I have been doing some social network analysis, I have stumbled upon the problem of fitting a probability distribution on network degree.

So, I have a probability distribution P(X >= x) which, from visual inspection, follows a power law with an exponential cutoff rather than a pure power law (a straight line).

So, given that the equation for power law distribution with exponential cutoff is:

f(x) = x**alpha * exp(beta*x)

How might I estimate the parameters alpha and beta using Python?

I know scipy.stats.powerlaw package exists and they have a .fit() function but that doesn't seem to do the job as it only returns the location and scale of the plot, which seems to be useful only for normal distribution? There are also not enough tutorials on this package.

P.S. I'm well aware of the implementation of CLauset et al but they don't seem to provide ways to estimate the parameters of alternate distributions.

Chris
  • 44,602
  • 16
  • 137
  • 156
Mike
  • 683
  • 10
  • 22
  • 2
    The Clauset paper is *the* best reference on practical fitting of power law functions. If you genuinely think you have a problem that isn't addressed, consider emailing the authors – Andrew Walker Jan 30 '12 at 12:03
  • I'm not a statistician so I'm not really sure if I understand the whole paper completely. I think the code by Ginsberg is great and very helpful. I just want to know if there are tools to help with estimating the parameters of other probability distributions. – Mike Jan 30 '12 at 13:40
  • http://en.wikipedia.org/wiki/Power_law where's the straight line you talk about? – Dima Tisnek Mar 23 '12 at 12:33
  • qarma, power law is a straight line if the data is plotted on a log-log plot – Mike Apr 12 '12 at 00:33

3 Answers3

4

Powerlaw library can directly be used to estimate the parameters as follows:

  1. Install all the pythons dependencies:

    pip install powerlaw mpmath scipy
    
  2. Run the powerlaw package fit in a python environment:

    import powerlaw
    data = [5, 4, ... ]
    results = powerlaw.Fit(data)
    
  3. get the parameters from the results

    results.truncated_power_law.parameter1 # power law  parameter (alpha)
    results.truncated_power_law.parameter2 # exponential cut-off parameter (beta)
    
atfornes
  • 468
  • 5
  • 21
2

The function scipy.stats.powerlaw.fit may still work for your purposes. It's a bit confusing how the distributions in scipy.stats work (the documentation for each one refers to the optional parameters loc and scale, even though not all of them use these parameters, and each uses them differently). If you look at the docs:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.powerlaw.html

there's also a second non-optional parameter "a", which is the "shape parameters". In the case of powerlaw, this contains a single parameter. Don't worry about "loc" and "scale".

Edit: Sorry, forgot that you wanted the beta parameter too. Your best best may be to define the powerlaw function you want yourself, and then use scipy's generic fitting algorithms to learn the parameters. For example: http://www.scipy.org/Cookbook/FittingData#head-5eba0779a34c07f5a596bbcf99dbc7886eac18e5

jamalex
  • 109
  • 6
0

Here is a means of estimating the scaling exponent and exponential rate of power law with exponential cut-off by maximizing likelihood in R:

# Input: Data vector, lower threshold
# Output: List, giving type ("powerexp"), scaling exponent, exponential rate, lower threshold, log-likelihood


powerexp.fit <- function(data,threshold=1,method="constrOptim",initial_rate=-1) {
  x <- data[data>=threshold]
  negloglike <- function(theta) {
    -powerexp.loglike(x,threshold,exponent=theta[1],rate=theta[2])
  }
  # Fit a pure power-law distribution
  pure_powerlaw <- pareto.fit(data,threshold)
  # Use this as a first guess at the exponent
  initial_exponent <- pure_powerlaw$exponent
  if (initial_rate < 0) { initial_rate <- exp.fit(data,threshold)$rate }
  minute_rate <- 1e-6
  theta_0 <- as.vector(c(initial_exponent,initial_rate))
  theta_1 <- as.vector(c(initial_exponent,minute_rate))
  switch(method,
    constrOptim = {
      # Impose the constraint that rate >= 0
      # and that exponent >= -1
      ui <- rbind(c(1,0),c(0,1))
      ci <- c(-1,0)
      # Can't start with values on the boundary of the feasible set so add
      # tiny amounts just in case
      if (theta_0[1] == -1) {theta_0[1] <- theta_0[1] + minute_rate}
      if (theta_0[2] == 0) {theta_0[2] <- theta_0[2] + minute_rate}
      est <- constrOptim(theta=theta_0,f=negloglike,grad=NULL,ui=ui,ci=ci)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    optim = {
      est <- optim(par=theta_0,fn=negloglike)
      alpha <- est$par[1]
      lambda <- est$par[2]
      loglike <- -est$value},
    nlm = {
      est.0 <- nlm(f=negloglike,p=theta_0)
      est.1 <- nlm(f=negloglike,p=theta_1)
      est <- est.0
      if (-est.1$minimum > -est.0$minimum) { est <- est.1;cat("NLM had to switch\n") }
      alpha <- est$estimate[1]
      lambda <- est$estimate[2]
      loglike <- -est$minimum},
    {cat("Unknown method",method,"\n"); alpha<-NA; lambda<-NA; loglike<-NA}
  )
  fit <- list(type="powerexp", exponent=alpha, rate=lambda, xmin=threshold,
              loglike=loglike, samples.over.threshold=length(x))
  return(fit)
}

Check out https://github.com/jeffalstott/powerlaw/ for more info

Kelvin
  • 9
  • 1
  • The answer is useful but incomplete, please check my answer for a simiar question using R at https://stackoverflow.com/a/45800141/4928558 for the steps to be able to run the code shared in this answer. – atfornes Aug 21 '17 at 17:04