1

I have a dataset:

ID   Action      Converted
567  Email         True
567  Text          True
567  Phone call    True
432  Phone call    False
432  Social Media  False
432  Text          False

and i want the final table to look like

ID  Email Text PhoneCall SocialMedia Converted
567 1     1    1         0             True
432 0     1    1         1             False

this is a small sample. I have 300+ unique actions so manually typing them out is not an option. Basically want to convert these categorical variables in multiple rows into a single row with the actions showing up as dummy variables.

Abdul
  • 19
  • 2

2 Answers2

1

pd.get_dummies converts categorical variable into dummy/indicator variables. Then groupby will concat the dataframe based on similar IDs and sum will be adding 1 to 0s in the Action columns

y = pd.get_dummies(df.Action, prefix='Action')
y['ID']=df.ID
y.groupby(['ID']).sum() 
  • Thank you for your response. 2 questions, is it possible to have a count instead of it being a dummy? I noticed as i dug deeper that those actions can occur more than once. Secondly this code deletes other columns (or doesnt carry them over). Is there a way to retain other original columns? One of the columns is a target variable. – Abdul Mar 14 '21 at 04:42
  • For the first one, you can use different aggregate function like count() And the second problem occurred because you have only action and id feature in y dataframe – Towsif Ahamed Labib Mar 14 '21 at 05:05
1
df.groupby(['ID', 'Action']).size().unstack(fill_value=0)

Output:

ID Email Phone call Social Media Text
432 0 1 1 1
567 1 1 0 1
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26
  • Please provide some explanation to your answer. – Towsif Ahamed Labib Mar 13 '21 at 19:21
  • Thank you both - this worked. What if instead of dummy i wanted the count of occurrences? Example if Phonecall were to be 5 for example – Abdul Mar 14 '21 at 03:29
  • Also is there a way to retain other columns. Using this code it removes all other columns (in my case it removes the target variable column) – Abdul Mar 14 '21 at 04:37
  • This code does count the occurences with `size()`. It should also work on your larger dataset. And what is the target variable column? There is none in your sample df. – RJ Adriaansen Mar 14 '21 at 07:25
  • Hey! I have updated what the original data set looks like with a target variable. Thanks again for your help – Abdul Mar 14 '21 at 15:05
  • Is 567 always True, and 432 always False? If so, it can easily be added, if not, there is no way to represent this column, as the output is grouped by `ID` – RJ Adriaansen Mar 14 '21 at 18:31
  • Yes the target variable's value is fixed to the ID, it will not be different. So 567 always true and 432 always False – Abdul Mar 15 '21 at 02:54