0

Noob trying my first Negative Binomial Regression. iPython on Google's Colab. I load the dataset as a pandas df. The features (and Target) in the formula below all appear in the df (which I named "dataset").

I also bring in

from patsy import dmatrices
import statsmodels.api as sm

however, when I

formula = """Target ~ MeanAge   + %White + %HHsNotWater + HHsIneq*10    + %NotSaLang + %male + %Informal + COGTACatG2B09 + %Poor + AGRating  """
data = dataset

response, predictors = dmatrices(formula, data, return_type='dataframe')
nb_results = sm.GLM(response, predictors, family=sm.families.NegativeBinomial(alpha=0.15)).fit()
print(nb_results.summary())

I simply get AssertionError:, and an arrow to line four (the one starting "response"). I have no idea how to remedy this, and cannot find similar problems on this site - any sage guidance, please?

Nordle
  • 2,915
  • 3
  • 16
  • 34
RandomForestRanger
  • 257
  • 1
  • 5
  • 16

1 Answers1

1

...the mistake I made was in the formula line. Python sees the "%" and "*" in my feature names as very different instructions altogether.

So changing each feature from HHsHotWater to Q('HHsNotWater') etc, made all the difference. @njsmith at the pydata/patsy issues github set me straight.

RandomForestRanger
  • 257
  • 1
  • 5
  • 16