I am using patsy to prepare categorical data for regression and want to map from a column name to its index in the DesignMatrix
. I have tried using the column_name_indexes
attribute of the DesignInfo
object but the column names have been modified to reflect the encoding.
Example using data from the docs:
>>> from patsy import demo_data, dmatrix
>>> data = demo_data("a", nlevels=3)
>>> data
{'a': ['a1', 'a2', 'a3', 'a1', 'a2', 'a3']}
>>> x = dmatrix("a", data)
>>> x
DesignMatrix with shape (6, 3)
Intercept a[T.a2] a[T.a3]
1 0 0
1 1 0
1 0 1
1 0 0
1 1 0
1 0 1
Terms:
'Intercept' (column 0)
'a' (columns 1:3)
>>> x.design_info.column_name_indexes
OrderedDict([('Intercept', 0), ('a[T.a2]', 1), ('a[T.a3]', 2)])
I would like to be able to access the column index of e.g. 'a2'
by calling:
x.design_info.column_name_indexes['a2']
But of course that returns KeyError: 'a2'
. So instead I have to construct the modified key myself in order to obtain the desired column index 1
:
x.design_info.column_name_indexes['a[T.a2]']
Is there a way to access the column index by referring to the unmodified feature/column name, i.e. 'a2'
rather than having to construct the modified key, i.e. 'a[T.a2]'
?