I am trying to use Python's statmodels for the analysis of some data. I know that the negative binomial regression fits the model, since the variance and mean values are different.(that is why I am not using a Poisson regression model)
In my case I have binary data such as male and female (x) and dependent values (y). I am not sure if the GLM with the Negative Binomial functionality of Python's statsmodels works fine with converting the x-Values to 0(false) and 1(true). Normally it should show that true-cases are more likely to happen than false cases.
import statsmodels.api as sm
import statsmodels.formula.api as smf
x=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
y=[0.0003, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0086, 0.0001,
0.0001, 0.09, 0.1, 0.000265, 0.0272, 0.0241, 0.386,
0.0050, 0.0035, 0.0051, 0.00351]
glm_Nbinomial = smf.GLM(y, x, family=sm.families.NegativeBinomial())
res_Nbinom = glm_Nbinomial.fit()
print(res_Nbinom.summary())
The coef value for this example is: -2,7416. How to interpret the value exaclty? Is there a better way to deal with binary cases using statsmodels?
UPDATE:
I just changed a bit my impmentation:
data = pd.DataFrame({'x' : x, 'y' : y})
pd.options.mode.chained_assignment = None
formula = 'y ~ x'
glm_Nbinomial = smf.glm(formula=formula, data=data,
family=sm.families.NegativeBinomial())
res_Nbinom = glm_Nbinomial.fit()
The way of converting the true and false cases in 1 and 0 seems to be possible, although I am only 80% sure. If I have ,e.g, the independent variable with values 1, 2, 3 and the same dependent values like mentioned in my example, how can I just compute the beta-coefficient for value 1 or 3, instead of an overall value for all? Or is there an alternative library I should use?
Kind regards
Tron