-2

I have Label Encoded my info.venue column as follows, but when i try to do the One Hot Encoding it gives error. as ValueError: Expected 2D array, got 1D array instead.

df['info.venue']=labelencoder.fit_transform(df['info.venue'])
from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder()
var1=onehotencoder.fit_transform(df['info.venue'])

My column is like this.

info.venue
Adelaide Oval
Brabourne Stadium
Kensington Oval, Bridgetown
Kingsmead
Melbourne Cricket Ground
Melbourne Cricket Ground
Melbourne Cricket Ground
Punjab Cricket Association IS Bindra Stadium, Mohali
R Premadasa Stadium
Saurashtra Cricket Association Stadium
Shere Bangla National Stadium
Stadium Australia
Sydney Cricket Ground

I want to encode this stadium names. But got value error.

Sociopath
  • 13,068
  • 19
  • 47
  • 75
Mayur Mahajan
  • 122
  • 11

1 Answers1

3

OneHotEncoder expects 2D array and you passed 1D (Series - resilt of labelencoder.fit_transform) - this can be fixed easily - use df[['info.venue']] instead of df['info.venue'] (pay attention at square brackets) in the following way:

df['info.venue']=labelencoder.fit_transform(df['info.venue'])
R = onehotencoder.fit_transform(df[['info.venue']])

where R is a sparse 2D matrix:

In [155]: R
Out[155]:
<13x11 sparse matrix of type '<class 'numpy.float64'>'
        with 13 stored elements in Compressed Sparse Row format>

In [156]: R.A
Out[156]:
array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

alternatively you can use LabelBinarizer in order to get One Hot Encoded values directly from strings:

Source DF:

In [121]: df
Out[121]:
                                info.venue
0                            Adelaide Oval
1                        Brabourne Stadium
2              Kensington Oval, Bridgetown
3                                Kingsmead
4                 Melbourne Cricket Ground
..                                     ...
8                      R Premadasa Stadium
9   Saurashtra Cricket Association Stadium
10           Shere Bangla National Stadium
11                       Stadium Australia
12                   Sydney Cricket Ground

[13 rows x 1 columns]

Solution:

In [122]: from sklearn.preprocessing import LabelBinarizer

In [123]: lb = LabelBinarizer()

In [124]: r = pd.SparseDataFrame(lb.fit_transform(df['info.venue']),
     ...:                        df.index,
     ...:                        lb.classes_,
     ...:                        default_fill_value=0)
     ...:

In [125]: r
Out[125]:
    Adelaide Oval  Brabourne Stadium  Kensington Oval, Bridgetown  Kingsmead  Melbourne Cricket Ground  \
0               1                  0                            0          0                         0
1               0                  1                            0          0                         0
2               0                  0                            1          0                         0
3               0                  0                            0          1                         0
4               0                  0                            0          0                         1
..            ...                ...                          ...        ...                       ...
8               0                  0                            0          0                         0
9               0                  0                            0          0                         0
10              0                  0                            0          0                         0
11              0                  0                            0          0                         0
12              0                  0                            0          0                         0

    Punjab Cricket Association IS Bindra Stadium, Mohali  R Premadasa Stadium  Saurashtra Cricket Association Stadium  \
0                                                   0                       0                                       0
1                                                   0                       0                                       0
2                                                   0                       0                                       0
3                                                   0                       0                                       0
4                                                   0                       0                                       0
..                                                ...                     ...                                     ...
8                                                   0                       1                                       0
9                                                   0                       0                                       1
10                                                  0                       0                                       0
11                                                  0                       0                                       0
12                                                  0                       0                                       0

    Shere Bangla National Stadium  Stadium Australia  Sydney Cricket Ground
0                               0                  0                      0
1                               0                  0                      0
2                               0                  0                      0
3                               0                  0                      0
4                               0                  0                      0
..                            ...                ...                    ...
8                               0                  0                      0
9                               0                  0                      0
10                              1                  0                      0
11                              0                  1                      0
12                              0                  0                      1

[13 rows x 11 columns]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419