0

Here's the data

PlayerID, Characters, Win or Lose

I can make it look like this

8PYPY0LLQ,valkyrie5 ,  chr_witch4 ,  hog_rider5 ,  zapMachine1 ,  mega_minion3 ,  baby_dragon2 ,  bomber7 ,  skeleton_horde1, 0

Or like this

2GRG822L9,"barbarians8, valkyrie5, chr_balloon3, fire_spirits8, minion8, firespirit_hut6, rage4, skeleton_horde3,",1

The second column is an 8 character combination from 70+ n characters.

I need to encode the variables to be dummy variables, so each character gets its own column. Is there a way to do this in python/R? I'm assuming you have to leave the second column as a string rather than outputting a csv file that looks like this.

2GRG822L9,barbarians8, valkyrie5, chr_balloon3, fire_spirits8, minion8, firespirit_hut6, rage4, skeleton_horde3,1
8PYPY0LLQ,valkyrie5 ,  chr_witch4 ,  hog_rider5 ,  zapMachine1 ,  mega_minion3 ,  baby_dragon2 ,  bomber7 ,  skeleton_horde1,0

It should probably look like this before dummy encoding

2GRG822L9,"barbarians8, valkyrie5, chr_balloon3, fire_spirits8, minion8, firespirit_hut6, rage4, skeleton_horde3,",1
8PYPY0LLQ,"valkyrie5 ,  chr_witch4 ,  hog_rider5 ,  zapMachine1 ,  mega_minion3 ,  baby_dragon2 ,  bomber7 ,  skeleton_horde1,",0
Tyler L
  • 835
  • 2
  • 16
  • 28

1 Answers1

0

I don't know if this is the best way to do it but I suggest first splitting these 8 strings into 8 columns using the following code:

df['Characters'].str[1:-1].str.split(',', expand=True).astype(str)

Then use the following code for each of these 8 columns to create dummies:

pd.get_dummies(df['your columns'])

Duplicate columns for a unique character might be created but you can easily merge them together.

Amir H
  • 115
  • 12