I have this table in Excel:
id class
0 2 3
1 1 3
2 3 5
Now, I want to do a 'special' one-hot encoding in Python. For each id in the first table, there are two numbers. Each number corresponds to a class (class1, class2, etc.). The second table is created based off of the first such that for each id, each number in its row shows up in its corresponding class column and the other columns just get zeros. For example, the numbers for id 0 are 2 and 3. The 2 is placed at class2 and the 3 is placed at class3. Classes 1, 4, and 5 get the default of 0. The result should be like:
id class1 class2 class3 class4 class5
0 0 2 3 0 0
1 1 0 3 0 0
2 0 0 3 0 5
My previous solution,
foo = lambda x: pd.Series([i for i in x.split()])
result=onehot['hotel'].apply(foo)
result.columns=['class1','class2']
pd.get_dummies(result, prefix='class', columns=['class1','class2'])
results in:
class_1 class_2 class_3 class_3 class_5
0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 0.0 1.0 0.0
2 0.0 0.0 1.0 0.0 1.0
(class_3 appears twice). What can I do to fix this? (After this step, I can transform it to the final format I want.)