OneHotEncoder
expects 2D array and you passed 1D (Series - resilt of labelencoder.fit_transform
) - this can be fixed easily - use df[['info.venue']]
instead of df['info.venue']
(pay attention at square brackets) in the following way:
df['info.venue']=labelencoder.fit_transform(df['info.venue'])
R = onehotencoder.fit_transform(df[['info.venue']])
where R is a sparse 2D matrix:
In [155]: R
Out[155]:
<13x11 sparse matrix of type '<class 'numpy.float64'>'
with 13 stored elements in Compressed Sparse Row format>
In [156]: R.A
Out[156]:
array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
alternatively you can use LabelBinarizer in order to get One Hot Encoded values directly from strings:
Source DF:
In [121]: df
Out[121]:
info.venue
0 Adelaide Oval
1 Brabourne Stadium
2 Kensington Oval, Bridgetown
3 Kingsmead
4 Melbourne Cricket Ground
.. ...
8 R Premadasa Stadium
9 Saurashtra Cricket Association Stadium
10 Shere Bangla National Stadium
11 Stadium Australia
12 Sydney Cricket Ground
[13 rows x 1 columns]
Solution:
In [122]: from sklearn.preprocessing import LabelBinarizer
In [123]: lb = LabelBinarizer()
In [124]: r = pd.SparseDataFrame(lb.fit_transform(df['info.venue']),
...: df.index,
...: lb.classes_,
...: default_fill_value=0)
...:
In [125]: r
Out[125]:
Adelaide Oval Brabourne Stadium Kensington Oval, Bridgetown Kingsmead Melbourne Cricket Ground \
0 1 0 0 0 0
1 0 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 0
4 0 0 0 0 1
.. ... ... ... ... ...
8 0 0 0 0 0
9 0 0 0 0 0
10 0 0 0 0 0
11 0 0 0 0 0
12 0 0 0 0 0
Punjab Cricket Association IS Bindra Stadium, Mohali R Premadasa Stadium Saurashtra Cricket Association Stadium \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
.. ... ... ...
8 0 1 0
9 0 0 1
10 0 0 0
11 0 0 0
12 0 0 0
Shere Bangla National Stadium Stadium Australia Sydney Cricket Ground
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
.. ... ... ...
8 0 0 0
9 0 0 0
10 1 0 0
11 0 1 0
12 0 0 1
[13 rows x 11 columns]