7

I have 2 dataframes with same column headers. I wish to perform hot encoding on both of them. I cannot perform them one by one. I wish to append two dataframe together and then perform hot encoding and then split them into 2 dataframes with headers on each of them again.

Code below perform hot encoding one by one instead of merging them and then hot encode.

train = pd.get_dummies(train, columns= ['is_discount', 'gender', 'city'])
test = pd.get_dummies(test, columns= ['is_discount', 'gender', 'city'])
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
Mervyn Lee
  • 1,957
  • 4
  • 28
  • 54

1 Answers1

9

Use concat with keys then divide i.e

#Example Dataframes 
train = pd.DataFrame({'x':[1,2,3,4]})
test = pd.DataFrame({'x':[4,2,5,0]})

# Concat with keys
temp = pd.get_dummies(pd.concat([train,test],keys=[0,1]), columns=['x'])

# Selecting data from multi index 
train,test = temp.xs(0),temp.xs(1)

Output :

#Train 
  x_0  x_1  x_2  x_3  x_4  x_5
0    0    1    0    0    0    0
1    0    0    1    0    0    0
2    0    0    0    1    0    0
3    0    0    0    0    1    0

#Test
   x_0  x_1  x_2  x_3  x_4  x_5
0    0    0    0    0    1    0
1    0    0    1    0    0    0
2    0    0    0    0    0    1
3    1    0    0    0    0    0
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108