11

There are 2 types of Generalized Linear Models:
1. Log-Linear Regression, also known as Poisson Regression
2. Logistic Regression

How to implement the Poisson Regression in Python for Price Elasticity prediction?

User456898
  • 5,704
  • 5
  • 21
  • 37
  • 1
    Is this somewhat what you're looking for http://statsmodels.sourceforge.net/devel/glm.html? Also, way too broad. – Ilja Everilä Jun 21 '16 at 10:29
  • The link you shared has the "Poisson distribution". I was looking for "Poisson Regression". It is there in R, but how to implement it in Python ? – User456898 Jun 21 '16 at 10:33
  • 1
    I am not looking for Logistic Regression. Wanted to know about Log-Linear (Poisson) Regression in Python. – User456898 Jun 21 '16 at 10:37
  • @IljaEverilä sure logistic regression will help a lot in a poisson regression problem. Don't add comments that make no sense. Better stay silent – Altons Jun 21 '16 at 10:46
  • @Altons that's true, removed. – Ilja Everilä Jun 21 '16 at 10:47
  • The scenario is a predictive model has to be built for Price Elasticity prediction. So was keen on knowing about Poisson Regression using Python. – User456898 Jun 21 '16 at 10:53
  • @Laurel wouldn't you use logistic or probit regression for Price elasticity? – Altons Jun 21 '16 at 11:04
  • @Alton, I don't know much about probit regression. I'll try Logistic Regression as I have a fair understanding about it. – User456898 Jun 21 '16 at 11:19
  • But Logistic Regression is the same as a classification problem. – User456898 Jun 21 '16 at 11:22
  • @Laurel Price elasticity modeling builds a model in which the actual response is the individuals acceptance or rejection of a quote or renewal offering so logistic reg will fit well if you're modelling the acceptance or rejection of a price offer - don't quote 100% here as I haven't done this type of modelling since Univ (loooong time ago) – Altons Jun 21 '16 at 12:22
  • @Altons I wanted to predict the estimated price value using a model. Regarding acceptance/rejection of renewal offering, as you said Logistic Regression works out pretty well. I'll consider doing the acceptance/ rejection concept. – User456898 Jun 21 '16 at 12:40

2 Answers2

20

Have a look at the statmodels package in python.

Here is an example

A bit more of input to avoid the link only answer

Assumming you know python here is an extract of the example I mentioned earlier.

import numpy as np
import pandas as pd
from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.cov_struct import (Exchangeable,
    Independence,Autoregressive)
from statsmodels.genmod.families import Poisson

pandas will hold the data frame with the data you want to use to feed your poisson model. statsmodels package contains large family of statistical models such as Linear, probit, poisson etc. from here you will import the Poisson family model (hint: see last import)

The way you fit your model is as follow (assuming your dependent variable is called y and your IV are age, trt and base):

fam = Poisson()
ind = Independence()
model1 = GEE.from_formula("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam)
result1 = model1.fit()
print(result1.summary())

As I am not familiar with the nature of your problem I would suggest to have a look at negative binomial regression if you need to count data is well overdispersed. with High overdispersion your poisson assumptions may not hold.

Plethora of info for poisson regression in R - just google it.

Hope now this answer helps.

Hassan Baig
  • 15,055
  • 27
  • 102
  • 205
Altons
  • 1,422
  • 3
  • 12
  • 23
6

If I am not mistaken, @Altons' answer is for GEEs, which assume some sort of grouped structure. The common Poisson Regression (without a need for a group, such as "subject") is implemented as General Linear Model in statsmodels:

import patsy
import statsmodels as sm
from statsmodels.genmod.families import Poisson


fam = Poisson()
f = 'some_count ~ some_numeric_variable + C(some_categorical_variable)'
y, X = patsy.dmatrices(f, data, return_type='matrix')

p_model = sm.GLM(y, X, family=fam)

result = p_model.fit()
print(result.summary())

The variables used in the formula are just placeholders for variables in the DataFrame data.

Ben
  • 784
  • 5
  • 14