I am trying to reproduce the example from the help string of statsmodels.api.anova_lm
:
import statsmodels.api as sm
from statsmodels.formula.api import ols
moore = sm.datasets.get_rdataset("Moore", "car",
cache=True) # load data
data = moore.data
data = data.rename(columns={"partner.status" :
"partner_status"}) # make name pythonic
moore_lm = ols('conformity ~ C(fcategory, Sum)*C(partner_status, Sum)',
data=data).fit()
table = sm.stats.anova_lm(moore_lm, typ=2) # Type 2 ANOVA DataFrame
print table
however, I get the following error message from moore_lm = ols('conformity ~ C(fcategory, Sum)*C(partner_status, Sum)', data=data)
:
ValueError: For numerical factors, num_columns must be an int
This is how data looks like:
>>> print data
partner_status conformity fcategory fscore
0 low 8 low 37
1 ... ... ... ...
I ran into the same problem with the data set I am actually interested in; so what causes this error?
On a side note, what does the C(<column>, Sum)
do?