4

I am going through a stats workbook with python, there is a practice hands on question on which i am stuck. Its related to Poisson regression and here is the problem statement:-

Perform the following tasks:

  1. Load the R data set Insurance from MASS package and Capture the data as pandas data frame
  2. Build a Poisson regression model with a log of an independent variable, Holders and dependent variable Claims.
  3. Fit the model with data.
  4. Find the sum of residuals.

I am stuck with point 4 above. Can anyone help with this step?

Here is what i have done so far :-

import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
df = sm.datasets.get_rdataset('Insurance', package='MASS', cache=False).data
poisson_model = smf.poisson('np.log(Holders) ~ -1 + Claims', df)
poisson_result = poisson_model.fit()
print(poisson_result.summary())

Here is the output so far :-

Now how to get sum of residuals?

RLave
  • 8,144
  • 3
  • 21
  • 37
AlphaBetaGamma
  • 1,910
  • 16
  • 21

8 Answers8

3

np.sum(poisson_result.resid)

works fine

You have used the wrong variables to build the poisson model as pointed out by Karthikeyan. Use this instead,

poisson_model = smf.poisson('Claims ~ np.log(Holders)',df)

  • The below answer is not getting excepted by the FP exercise------->>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Optimization terminated successfully. Current function value: 3.468160 Iterations 7 6.679101716144942e-13 – Sam_2207 Jun 13 '20 at 11:46
2

Try below code for Fresco play

import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
df_insurance=sm.datasets.get_rdataset("Insurance","MASS")
df_data=df_insurance.data
insurance_model=smf.poisson('Claims ~ np.log(Holders)', df_data).fit()
print(np.cumsum(insurance_model.resid))
Suman
  • 21
  • 3
1

1.a) Load the R data set Insurance from MASS package

1.b) and Capture the data as pandas data frame

2) Build a Poisson regression model with a log of an independent variable, Holders and dependent variable Claims.

3) Fit the model with data.

4) Find the sum of residuals.

import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

#load the R data set insurrance from MASS package
ins = sm.datasets.get_rdataset('Insurance','MASS').data
# capture the data as pandas data frame
ins_pd = pd.DataFrame(ins)
# build a poisson regressions model with
# a log of an independent variable "Holders" 
# and dependent variable "Claims"
# fit the model with data
result = smf.poisson('Claims ~ np.log(Holders)',data=ins).fit()
# you can also use
# model = smf.poisson('Claims ~ np.log(Holders)',data=ins)
# result = model.fit()

# Find tue sum of residuals
print('Sum ot the residuals:',np.sum(result.resid))

i'm new on this so i don't know if capture the data as panda dataframe is fine or not but letme now

greetings

1

Fresco Mex

import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

df_data=sm.datasets.get_rdataset("Insurance","MASS").data
df_dataf= pd.DataFrame(df_data)
insurance_model=smf.poisson('Claims ~ np.log(Holders)',df_data)
insurance_model_result=insurance_model.fit()
print(np.sum(insurance_model_result.resid))
  • I am getting output like below Optimization terminated successfully. Current function value: 3.468160 Iterations 7 6.679101716144942e-13 – Sam_2207 Jun 12 '20 at 09:58
0

in the poisson_model = smf.poisson('np.log(Holders) ~ -1 + Claims', df) statement, the dependent variable "Claims" should come in the right hand side

poisson_model = smf.poisson('Claims ~ np.log(Holders)-1 ', df)

  • Welcome to Stack Overflow! Use formatting tools to make your post more readable. Code block should look like `code block`. Use **bold** *italics* if needed. – Morse Jun 28 '18 at 19:09
0

this qualified in "Fresco" if anyone is looking for the solution

df_insurance=sm.datasets.get_rdataset("Insurance","MASS")
df_data=df_insurance.data
insurance_model=smf.poisson('Claims ~ np.log(Holders)',df_data)
insurance_model_result=insurance_model.fit()
res=(insurance_model_result.resid)
print(np.sum(res))
Community
  • 1
  • 1
jatin
  • 1
0

I don't know it will work or not .but I refer this docs

https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Insurance.html https://vincentarelbundock.github.io/Rdatasets/datasets.html

So I hope this will work too.

import statsmodels.api as sm
import  statsmodels.formula.api as smf 
import numpy as np
import pandas as pd 

data=pd.DataFrame(sm.datasets.get_rdataset("Insurance","MASS",cache=True).data)
model=smf.poisson('Claims ~ District + Group + Age + np.log(Holders)',data).fit()
print(np.sum(model.resid))
ankita
  • 11
  • 4
0

Try np.cumsum(model.resid) for this question.

Ideally np.sum(model.resid) should be the right answer for the question... But if the system is not accepting it, try the cumsum