I'm using patsy to fit regressions with statsmodels using the formula api.
My problem is that my design matrix is singular because patsy creates (locally?) redundant interactions of categoricals.
import patsy
import pandas as pd
data = [('y',[2,5,6]),
('c1',['a','a','b']),
('c2',['g','f','g'])]
df = pd.DataFrame.from_items(data)#([y,c1,c2],columns=['y','c1','c2'])
formula = "y ~C(c1):C(c2)-1"
y,X = patsy.dmatrices(formula,df,return_type='dataframe')
print (X)
C(c1)[a]:C(c2)[f] C(c1)[b]:C(c2)[f] C(c1)[a]:C(c2)[g] C(c1)[b]:C(c2)[g]
0 0.0 0.0 1.0 0.0
1 1.0 0.0 0.0 0.0
2 0.0 0.0 0.0 1.0
I would like to exclude the second column since c1
doesn't have value b
when c2
has the value f