Trying to turn multiple rows of categorical variables into single row with dummy variable columns

Question

I have a dataset:

ID   Action      Converted
567  Email         True
567  Text          True
567  Phone call    True
432  Phone call    False
432  Social Media  False
432  Text          False

and i want the final table to look like

ID  Email Text PhoneCall SocialMedia Converted
567 1     1    1         0             True
432 0     1    1         1             False

this is a small sample. I have 300+ unique actions so manually typing them out is not an option. Basically want to convert these categorical variables in multiple rows into a single row with the actions showing up as dummy variables.

Using onehotencoder and then getting column names by "get_feature_names" would be a way too. Then grouping by ID and summing. — Towsif Ahamed Labib, Mar 13 '21 at 19:13

score 1 · Answer 1 · answered Mar 13 '21 at 19:08

1

pd.get_dummies converts categorical variable into dummy/indicator variables. Then groupby will concat the dataframe based on similar IDs and sum will be adding 1 to 0s in the Action columns

y = pd.get_dummies(df.Action, prefix='Action')
y['ID']=df.ID
y.groupby(['ID']).sum()

answered Mar 13 '21 at 19:08

Towsif Ahamed Labib

365
3
14

Thank you for your response. 2 questions, is it possible to have a count instead of it being a dummy? I noticed as i dug deeper that those actions can occur more than once. Secondly this code deletes other columns (or doesnt carry them over). Is there a way to retain other original columns? One of the columns is a target variable. – Abdul Mar 14 '21 at 04:42
For the first one, you can use different aggregate function like count() And the second problem occurred because you have only action and id feature in y dataframe – Towsif Ahamed Labib Mar 14 '21 at 05:05

score 1 · Answer 2 · answered Mar 13 '21 at 19:12

1

df.groupby(['ID', 'Action']).size().unstack(fill_value=0)

Output:

ID	Email	Phone call	Social Media	Text
432	0	1	1	1
567	1	1	0	1

answered Mar 13 '21 at 19:12

RJ Adriaansen

9,131
2
12
26

Please provide some explanation to your answer. – Towsif Ahamed Labib Mar 13 '21 at 19:21
Thank you both - this worked. What if instead of dummy i wanted the count of occurrences? Example if Phonecall were to be 5 for example – Abdul Mar 14 '21 at 03:29
Also is there a way to retain other columns. Using this code it removes all other columns (in my case it removes the target variable column) – Abdul Mar 14 '21 at 04:37
This code does count the occurences with `size()`. It should also work on your larger dataset. And what is the target variable column? There is none in your sample df. – RJ Adriaansen Mar 14 '21 at 07:25
Hey! I have updated what the original data set looks like with a target variable. Thanks again for your help – Abdul Mar 14 '21 at 15:05
Is 567 always True, and 432 always False? If so, it can easily be added, if not, there is no way to represent this column, as the output is grouped by `ID` – RJ Adriaansen Mar 14 '21 at 18:31
Yes the target variable's value is fixed to the ID, it will not be different. So 567 always true and 432 always False – Abdul Mar 15 '21 at 02:54

Trying to turn multiple rows of categorical variables into single row with dummy variable columns

2 Answers2