Preserve column order while one-hot encoding using pandas.get_dummies

Question

What is the best/most Pythonic way to one-hot encode categorical features in a Pandas data frame while preserving the original order of the columns from which the categories (new column names) are extracted?

For example, if I have three columns in my data frame (df0): ["Col_continuous", "Col_categorical", "Labels"], and I use

df1hot = pd.get_dummies(df0, columns = ["Col_categorical"])

the new data frame has the newly created columns appearing after the "Labels" column. I want the new columns in between "Col_continuous" and "Labels".

For robustness, I want the order preserved when dealing with data frames with categorical columns arbitrarily ordered among the rest of the columns For example, for ["Cont1", "Cat1", "Cont2", "Cont3", "Cat2", "Labels"], I want the new columns resulting from "Cat1" to be in between "Cont1" and "Cont2". Assume that I already have a variable, say categoricalCols, which is a list of names of categorical features.

Edit 1: changed df1hot = pd.get_dummies(df0, columns = ["Col_continuous"]) to df1hot = pd.get_dummies(df0, columns = ["Col_categorical"]) thanks to Juan C's comment.

Edit 2: added paragraph starting with "For robustness,..."

I guess you meant `df1hot = pd.get_dummies(df0, columns = ["Col_categorical"])` instead of `df1hot = pd.get_dummies(df0, columns = ["Col_continuous"])`, right? — Juan C, Apr 04 '19 at 15:53

Juan C · Answer 1 · 2019-04-04T16:02:28.450

2

IIUC I would go with something like this:

df.columns=['Col_continuous',*[i for i in df.columns if 'Col_categorical' in i], 'Labels']

This tells pandas to put every column created by get_dummies in the middle of df.columns

edited Apr 04 '19 at 16:02

answered Apr 04 '19 at 15:57

Juan C

5,846
2
17
51

Thanks for your answer. I added a paragraph to the original question to address a more general problem that I have. Please see if you have a Pythonic solution for it. – strangeloop Apr 04 '19 at 16:16
I get you. I was able to do it, but very un-pythonically. I don't if that would work for you – Juan C Apr 04 '19 at 17:39

score 0 · Answer 2 · answered Apr 25 '20 at 12:17

I don't know if it is Pythonic enough, but the following code is the only way I found to address a more general problem:

df0['Col_categorical'] = pd.Categorical(df0['Col_categorical'])
dfDummies = pd.get_dummies(df0['Col_categorical'])
column_position = df0.columns.get_loc('Col_categorical')
df1 = df.iloc[:, :column_position]
df2 = df.iloc[:, column_position+1:]
df1hot = pd.concat([df1, dfDummies, df2], axis=1)

I get the column position of the categorical column, then I split the original dataframe into two dataframes, and insert the one-hot-encoded columns between them.

Preserve column order while one-hot encoding using pandas.get_dummies

2 Answers2