4

What is the best/most Pythonic way to one-hot encode categorical features in a Pandas data frame while preserving the original order of the columns from which the categories (new column names) are extracted?

For example, if I have three columns in my data frame (df0): ["Col_continuous", "Col_categorical", "Labels"], and I use

df1hot = pd.get_dummies(df0, columns = ["Col_categorical"])

the new data frame has the newly created columns appearing after the "Labels" column. I want the new columns in between "Col_continuous" and "Labels".

For robustness, I want the order preserved when dealing with data frames with categorical columns arbitrarily ordered among the rest of the columns For example, for ["Cont1", "Cat1", "Cont2", "Cont3", "Cat2", "Labels"], I want the new columns resulting from "Cat1" to be in between "Cont1" and "Cont2". Assume that I already have a variable, say categoricalCols, which is a list of names of categorical features.

Edit 1: changed df1hot = pd.get_dummies(df0, columns = ["Col_continuous"]) to df1hot = pd.get_dummies(df0, columns = ["Col_categorical"]) thanks to Juan C's comment.

Edit 2: added paragraph starting with "For robustness,..."

strangeloop
  • 751
  • 1
  • 9
  • 15

2 Answers2

2

IIUC I would go with something like this:

df.columns=['Col_continuous',*[i for i in df.columns if 'Col_categorical' in i], 'Labels']

This tells pandas to put every column created by get_dummies in the middle of df.columns

Juan C
  • 5,846
  • 2
  • 17
  • 51
  • Thanks for your answer. I added a paragraph to the original question to address a more general problem that I have. Please see if you have a Pythonic solution for it. – strangeloop Apr 04 '19 at 16:16
  • I get you. I was able to do it, but very un-pythonically. I don't if that would work for you – Juan C Apr 04 '19 at 17:39
0

I don't know if it is Pythonic enough, but the following code is the only way I found to address a more general problem:

df0['Col_categorical'] = pd.Categorical(df0['Col_categorical'])
dfDummies = pd.get_dummies(df0['Col_categorical'])
column_position = df0.columns.get_loc('Col_categorical')
df1 = df.iloc[:, :column_position]
df2 = df.iloc[:, column_position+1:]
df1hot = pd.concat([df1, dfDummies, df2], axis=1)

I get the column position of the categorical column, then I split the original dataframe into two dataframes, and insert the one-hot-encoded columns between them.

Sami Belkacem
  • 336
  • 3
  • 12