(patsy v0.4.1, python 3.5.0)
I would like to use patsy (ideally through statsmodels) to build a design matrix for regression.
The patsy-style formula that I would like to fit is
response ~ 0 + category
where category is a two-level categorical variable. The 0 + ...
is supposed to indicate that I do not want the implicit intercept term.
The design matrix that I expect has a single column with zeros and ones indicating whether category
has the base-level (0) or the other level (1).
The following code:
import pandas as pd
import patsy
df = pd.DataFrame({'category': ['A', 'B'] * 3})
patsy.dmatrix('0 + category', data=df)
Outputs:
DesignMatrix with shape (6, 2)
category[A] category[B]
1 0
0 1
1 0
0 1
1 0
0 1
Terms:
'category' (columns 0:2)
which is singular and not what I want.
When I instead run
import pandas as pd
import patsy
df = pd.DataFrame({'category': ['A', 'B'] * 3})
patsy.dmatrix('category', data=df)
the output is
DesignMatrix with shape (6, 2)
Intercept category[T.B]
1 0
1 1
1 0
1 1
1 0
1 1
Terms:
'Intercept' (column 0)
'category' (column 1)
which is correct for the model which includes an intercept, but still not what I want.
Is the output without an intercept the intended behavior? If so, why? Am I just confused about how this design matrix is supposed to work with standard coding?
I know that I can edit the design matrix to make my regression work the way I intend, but if this is a bug I'd like to see it fixed in patsy.