2

I read csv file in pandas Dataframe and then get its dummy and concat them, but for example I Have column named "Genre" and it contains "comedy, drama" and "action, comedy" so when I get dummy and concat them it makes a object for each sentence but I want parse them.for example I want to makes object 'Genre.comedy' , 'Genre.Drama', 'Genre.action' instead of 'Genre.comedy,drama' and 'Genre.action,comedy' Here is my code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv
from sklearn import preprocessing
trainset = pd.read_csv("/Users/yada/Downloads/IMDBMovieData.csv", encoding='latin-1')
X = trainset.drop(['Description', 'Runtime'], axis=1)
features = ['Genre','Actors']
for f in features:
    X_dummy = pd.get_dummies(X[f], prefix = f)
    X = X.drop([f], axis = 1)
    X = pd.concat((X, X_dummy), axis = 1)

and this is the some row of my csv file: csv

yasi
  • 397
  • 1
  • 4
  • 14

1 Answers1

1

I think need str.get_dummies with add_prefix:

features = ['Genre','Actors']
for f in features:
    X_dummy = X[f].str.get_dummies(', ').add_prefix(f + '.')
    X = X.drop([f], axis = 1)
    X = pd.concat((X, X_dummy), axis = 1)

Or:

trainset = pd.DataFrame({'Description':list('abc'),
                   'Genre':['comedy, drama','action, comedy','action'],
                   'Actors':['a, b','a, c','d, a'],
                   'Runtime':[1,3,5],
                   'E':[5,3,6],
                   'F':list('aaa')})

print (trainset)
  Description           Genre Actors  Runtime  E  F
0           a   comedy, drama   a, b        1  5  a
1           b  action, comedy   a, c        3  3  a
2           c          action   d, a        5  6  a

X = trainset.drop(['Description', 'Runtime'], axis=1)
features = ['Genre','Actors']
X_dummy_list = [X.pop(f).str.get_dummies(', ').add_prefix(f + '.') for f in features]
X = pd.concat([X] + X_dummy_list , axis = 1)
print (X)

   E  F  Genre.action  Genre.comedy  Genre.drama  Actors.a  Actors.b  \
0  5  a             0             1            1         1         1   
1  3  a             1             1            0         1         0   
2  6  a             1             0            0         1         0   

   Actors.c  Actors.d  
0         0         0  
1         1         0  
2         0         1  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • yes! I know that I should do something in dummy's command to recognize "," and parse from that point but I can't find out by search, Thanks a lot Jezrael!! – yasi Jul 25 '18 at 07:32
  • and do you know how can I print or get one column of that dataframe? for example I want "Actors.Keanu Reeves" column and I use this: print (X['Actors.Keanu Reeves']) But it didn't work :/ – yasi Jul 25 '18 at 07:44
  • no it works very well, I just ask another question :) – yasi Jul 25 '18 at 07:47
  • @yasi - It should working nice, no some double whitespace or similar? – jezrael Jul 25 '18 at 07:49
  • yes you are right that was problem with space, thanks a lot dear jezrael! – yasi Jul 25 '18 at 08:22