2

I had the same error as in this question.

What is weird, is that it works (with the answer provided) in an ipython shell, but not in an ipython notebook. But it's related to the C() operator, because without it works (but not as an operator)

Same with that example :

import statsmodels.formula.api as smf
import numpy as np
import pandas


url = "http://vincentarelbundock.github.com/Rdatasets/csv/HistData/Guerry.csv"
df = pandas.read_csv(url)
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
df.head()
mod = smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
res = mod.fit()
print res.summary()

This works well, both in the ipython notebook and in the shell, and patsy treats Region as categorical variable because it's composed of strings.

but if I try this (as in the tutorial) :

res = smf.ols(formula='Lottery ~ Literacy + Wealth + C(Region)', data=df).fit()

I got an error in the ipython notebook:

TypeError: 'Series' object is not callable

Note that both in the notebook and in the shell statsmodels and patsy are the same versions (0.5.0 and 0.3.0 respectively)

Do you have the same error ?

Community
  • 1
  • 1
jrjc
  • 21,103
  • 9
  • 64
  • 78
  • I don't get any error, neither with patsy 0.2.1 nor with patsy 0.3. I'm using statsmodels development version, but I don't think there are any statsmodels related changes. The pandas versions I used are 0.12 and 0.14.1dev. – Josef Oct 06 '14 at 15:10
  • @user333700 : You don't get the error in the ipython notebook ? Do you have any idea on how to fix this from my side if it's not a bug ? – jrjc Oct 06 '14 at 16:45
  • No error in the notebook. (In the notebook I only have pandas 0.12, but both patsy versions.) I don't have much idea about how to fix it since I have no suspect about what might be wrong. Try to look at the details of the traceback and check where there might be a `C` series defined. Do you have many automatic imports that might interfere? – Josef Oct 06 '14 at 23:49
  • @user333700 : So actually it was because I had a variable in my namespace called `C`, but not in the DataFrame. Thanks for your help – jrjc Oct 07 '14 at 12:28

1 Answers1

2

I eventually found the problem.

It is because there was a variable called C that I used way earlier in the notebook. What is surprising though, is that it was not a column of the df I used.

Anyway, the basic solution is :

del C

before running the regression.

Hope this will help people facing the same problem.

But I'm still not sure whether this is an expected behavior of patsy.

jrjc
  • 21,103
  • 9
  • 64
  • 78
  • This is expected behavior. The evaluation environment of patsy resolves to the caller's namespace. We hard-code this in statsmodels. See the `eval_env` variable in the [patsy documentation](http://patsy.readthedocs.org/en/latest/API-reference.html). I am debating exposing this to users so it's easier to avoid situations like this. – jseabold Oct 07 '14 at 13:47
  • @jseabold : Thanks for your comment. I think it would be great to just mention this in the tutorial (like [this one, section Categorical Variable](http://statsmodels.sourceforge.net/devel/example_formulas.html)), with some kind of warning box. Because I think this doc is a bit too technical for such "small" problem. If I had seen it, not sure it would have helped me. – jrjc Oct 07 '14 at 14:00
  • 1
    I added a note about it and the ability for users to control which namespace environment the formula is evaluated in. https://github.com/statsmodels/statsmodels/pull/2031 – jseabold Oct 08 '14 at 18:54