7

I have a DataFrame with an ID variable and another categorical variable. I want to create dummy variables out of the categorical variable with get_dummies.

dum = pd.get_dummies(df)

However, this makes the ID variable disappear. And I need this ID variable later on to merge to other data sets.

Is there a way to keep other variables. In the documentation of get_dummies I could not find anything. Thanks!

Bert Carremans
  • 1,623
  • 4
  • 23
  • 47
  • Can you add a sample for your `df` to the question? My first attempt would be to exclude `ID` when calling `get_dummies` and then, later on adding the column again. – Michael Hoff Jul 23 '16 at 12:13

2 Answers2

9

You can also copy the original column into a new one before executing get_dummies. E.g.,

df['dum_orig'] = df['dum']
df = pd.get_dummies(df, columns=['dum'])
Tom
  • 1,003
  • 2
  • 13
  • 25
5

I found the answer. You can concatenate the dummies data set to the original data set like shown below. As long as you don't re-order the data in the meantime.

df = pd.concat([df, dum], axis=1) 
Bert Carremans
  • 1,623
  • 4
  • 23
  • 47
  • 4
    That's correct, but if your df has some index you may face problems as _concat_ method merges based on index while _get_dummies_ resets it. In this case, I'd recommend using [set_index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html) method: `df = pd.concat([df, dum.set_index(df.index)], axis=1)` – Mike Apr 22 '19 at 14:19
  • 5
    Is there still no argument implemented in get_dummies that let's you do this easily? Seems kind of like a common problem... – user21398 Jan 31 '21 at 22:55