Suppose data frame df
is
d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)
Such that it's
Elden Ring Fantasy;Videogame
Starcraft 2 Videogame
Terraforming Mars Fantasy;Boardgame
My goal is to end with a dataframe that looks like this:
Title Genres Fantasy Videogame Boardgame
Elden Ring [Fantasy, Videogame] 1 1 0
Starcraft 2 [Videogame] 0 1 0
Terraforming Mars [Fantasy, Boardgame] 1 0 1
How is the best way to go about this? I tried doing
from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels
This gives me a dataframe:
Title Genre Boardgame Fantasy Videogame
Elden Ring [Fantasy, Videogame] 0 1 1
Starcraft 2 [Videogame] 0 0 1
Terraforming Mars [Fantasy, Boardgame] 1 1 0
This gives me what I want but it felt convoluted to do. Is there a cleaner way to be doing this?