Keep other variables when executing get_dummies in Pandas

Question

I have a DataFrame with an ID variable and another categorical variable. I want to create dummy variables out of the categorical variable with get_dummies.

dum = pd.get_dummies(df)

However, this makes the ID variable disappear. And I need this ID variable later on to merge to other data sets.

Is there a way to keep other variables. In the documentation of get_dummies I could not find anything. Thanks!

Can you add a sample for your `df` to the question? My first attempt would be to exclude `ID` when calling `get_dummies` and then, later on adding the column again. — Michael Hoff, Jul 23 '16 at 12:13

score 9 · Answer 1 · answered Nov 12 '18 at 00:23

9

You can also copy the original column into a new one before executing get_dummies. E.g.,

df['dum_orig'] = df['dum']
df = pd.get_dummies(df, columns=['dum'])

answered Nov 12 '18 at 00:23

Tom

1,003
2
13
25

score 5 · Accepted Answer · answered Jul 23 '16 at 12:16

5

I found the answer. You can concatenate the dummies data set to the original data set like shown below. As long as you don't re-order the data in the meantime.

df = pd.concat([df, dum], axis=1)

answered Jul 23 '16 at 12:16

Bert Carremans

1,623
4
23
47

4

That's correct, but if your df has some index you may face problems as _concat_ method merges based on index while _get_dummies_ resets it. In this case, I'd recommend using [set_index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html) method: `df = pd.concat([df, dum.set_index(df.index)], axis=1)` – Mike Apr 22 '19 at 14:19
5

Is there still no argument implemented in get_dummies that let's you do this easily? Seems kind of like a common problem... – user21398 Jan 31 '21 at 22:55

Keep other variables when executing get_dummies in Pandas

2 Answers2