1

I am trying to reproduce the example from the help string of statsmodels.api.anova_lm:

import statsmodels.api as sm
from statsmodels.formula.api import ols

moore = sm.datasets.get_rdataset("Moore", "car",
                                 cache=True) # load data
data = moore.data
data = data.rename(columns={"partner.status" :
                            "partner_status"}) # make name pythonic
moore_lm = ols('conformity ~ C(fcategory, Sum)*C(partner_status, Sum)',
                data=data).fit()

table = sm.stats.anova_lm(moore_lm, typ=2) # Type 2 ANOVA DataFrame
print table

however, I get the following error message from moore_lm = ols('conformity ~ C(fcategory, Sum)*C(partner_status, Sum)', data=data):

ValueError: For numerical factors, num_columns must be an int

This is how data looks like:

>>> print data
      partner_status  conformity fcategory  fscore
0                low           8       low      37
1                ...         ...       ...     ...

I ran into the same problem with the data set I am actually interested in; so what causes this error?

On a side note, what does the C(<column>, Sum) do?

Faultier
  • 1,296
  • 2
  • 15
  • 21

1 Answers1

2

update pasty:

 pip install https://github.com/pydata/patsy/archive/master.zip 

in my case, it was from version 0.4 to '0.4.1+dev'

"C" represents categorical variables

Diego
  • 34,802
  • 21
  • 91
  • 134