0

My dependent variable is over-dispersed. Therefore, I want to apply a generalized negative binomial regression on my data. In addition, I want to examine the effects of the indicators on the mean and dispersion parameter, like in these two papers:

On page 128: Fleming, Lee (2001): Recombinant Uncertainty in Technological Search. In Management Science 47 (1), pp. 117–132. DOI: 10.1287/mnsc.47.1.117.10671.

On page 719: Verhoeven, Dennis; Bakker, Jurriën; Veugelers, Reinhilde (2016): Measuring technological novelty with patent-based indicators. In Research Policy 45 (3), pp. 707–723. DOI: 10.1016/j.respol.2015.11.010.

Both authors did the regression in STATA, therefore I can not rely on their code, as I want to do it in Python (or if not possible, in SPSS).

My current Python code processes the regression and shows regression coefficients. However, I do not see the option to get the effect on the mean and the dispersion:

expr = """CIT_REC ~ SCIENCE_NOV  
+ APY + PBY + IPC_A + IPC_B + IPC_C + IPC_D + IPC_E + IPC_F + IPC_G + IPC_H + IPC_Y + NUM_CLAIMS + NUM_ID_CLAIMS + NUM_DP_CLAIMS + COMPL_CLAIMS"""

y_train, X_train = dmatrices(expr, df_train, return_type='dataframe')

X_train = sm.add_constant(X_train)

poisson_training_results = sm.GLM(y_train, X_train, family=sm.families.Poisson()).fit()
#print(poisson_training_results.summary())

import statsmodels.formula.api as smf
df_train['BB_LAMBDA'] = poisson_training_results.mu

df_train['AUX_OLS_DEP'] = df_train.apply(lambda x: ((x['CIT_REC'] - x['BB_LAMBDA'])**2 - x['CIT_REC']) / x['BB_LAMBDA'], axis=1)

ols_expr = """AUX_OLS_DEP ~ BB_LAMBDA - 1"""
aux_olsr_results = smf.ols(ols_expr, df_train).fit()
print(aux_olsr_results.params)

nb2_training_results = sm.GLM(y_train, X_train,family=sm.families.NegativeBinomial(alpha=aux_olsr_results.params[0])).fit()
print(nb2_training_results.summary())

This is the current output.

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                CIT_REC   No. Observations:               120332
Model:                            GLM   Df Residuals:                   120316
Model Family:        NegativeBinomial   Df Model:                           15
Link Function:                    log   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:            -3.7912e+05
Date:                Thu, 08 Oct 2020   Deviance:                       74180.
Time:                        10:45:42   Pearson chi2:                 2.05e+05
No. Iterations:                    14                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept       228.8814      3.172     72.148      0.000     222.664     235.099
SCIENCE_NOV       3.3563      0.532      6.309      0.000       2.314       4.399
APY               0.0129      0.008      1.663      0.096      -0.002       0.028
PBY              -0.1385      0.008    -17.227      0.000      -0.154      -0.123
IPC_A            26.0610      0.353     73.732      0.000      25.368      26.754
IPC_B            25.3848      0.352     72.015      0.000      24.694      26.076
IPC_C            24.7705      0.356     69.669      0.000      24.074      25.467
IPC_D            24.6420      0.382     64.585      0.000      23.894      25.390
IPC_E            25.0614      0.357     70.161      0.000      24.361      25.762
IPC_F            25.3837      0.358     70.980      0.000      24.683      26.085
IPC_G            25.6531      0.352     72.802      0.000      24.962      26.344
IPC_H            25.7289      0.354     72.631      0.000      25.035      26.423
IPC_Y            26.1960      0.367     71.351      0.000      25.476      26.916
NUM_CLAIMS       -0.5566      0.178     -3.123      0.002      -0.906      -0.207
NUM_ID_CLAIMS     0.5767      0.178      3.235      0.001       0.227       0.926
NUM_DP_CLAIMS     0.5758      0.178      3.230      0.001       0.226       0.925
COMPL_CLAIMS     -0.0002   2.56e-05     -7.709      0.000      -0.000      -0.000
=================================================================================

Edit: I asked the authors got the following message. We used the stata ‘nbreg’ command, and specify ‘lnalpha(vars)’ as an option to model the dispersion. Is there a similar function in python or SPSS?

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Nils_Denter
  • 488
  • 1
  • 6
  • 18
  • not very clear what you are after. So when you do a negbin regression, it estimates one common dispersion for your data. In your example, you have performed a poisson regression to get the predicted means. And you regress this against the residual.. which is really baffling – StupidWolf Oct 10 '20 at 13:12
  • 1
    If you read Fleming, Lee (2001), what they did is to fit a full model, estimate the dispersion, fit a reduced model with one coefficient, get the dispersion and calculate the difference. Is this what you want? – StupidWolf Oct 10 '20 at 13:19
  • Yes. I want the same method as they did but for my data. – Nils_Denter Oct 11 '20 at 15:45

0 Answers0