from patsy import *
from pandas import *
dta = DataFrame([["lo", 1],["hi", 2.4],["lo", 1.2],["lo", 1.4],["very_high",1.8]], columns=["carbs", "score"])
dmatrix("carbs + score", dta)
DesignMatrix with shape (5, 4)
Intercept carbs[T.lo] carbs[T.very_high] score
1 1 0 1.0
1 0 0 2.4
1 1 0 1.2
1 1 0 1.4
1 0 1 1.8
Terms:
'Intercept' (column 0), 'carbs' (columns 1:3), 'score' (column 3)
Question : instead of specifying "names" of the columns using Designinfo (which basically makes my code less re-usable) , can I not READ the names given by this DesignMatrix so that I can feed this into a DataFrame later, without needing to know pre-hand what the "reference level/control group" level was ?
ie. When I do dmatrix("C(carbs, Treatment(reference='lo')) + score", dta)
"""
# How can I get something like this with dmatrix's output without hardcoding ?
names = obtained from dmatrix's output above
This should give names = ['Intercept' ,'carbs[T.lo]', 'carbs[T.very_high]', 'score']
"""
g=DataFrame(dmatrix("carbs + score", dta),columns=names)
Intercept carbs[T.lo] carbs[T.very_high] score
0 1 2 3
0 1 1 0 1.0
1 1 0 0 2.4
2 1 1 0 1.2
3 1 1 0 1.4
4 1 0 1 1.8
type(g)=<class 'pandas.core.frame.DataFrame'>
so g would be the transformed dataframe I can do logistic modelling on without needing to keep a note of (or hard-coding thereof) of the column names & their reference levels.