0

I am trying to convert some categorical features into one hot encodings for use in Keras. However, when I try to map these features, I end up receiving an error indicating the shapes are incompatible. Here is my code:

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils

# load dataset
dataframe = pandas.read_csv("data/development.csv")
dataset = dataframe.values

X = dataset[:,0:7].astype(int)

encoder = LabelEncoder()

for i in [3,4,5,6]:
    col = X[i]
    encoder.fit(col)
    encoded_col = encoder.transform(col)
    X[i] = np_utils.to_categorical(encoded_col) # Error is here

Y = dataset[:,7].astype(int)

And here is the error I'm receiving:

ValueError: could not broadcast input array from shape (7,5) into shape (7)

Is there anything that I should be doing differently here? I am using Python 3.6, with Keras 2.2.2.

marked-down
  • 9,958
  • 22
  • 87
  • 150
  • broadcasting can change a (7,) to (1,7), but not (7,1). – hpaulj Sep 06 '18 at 10:39
  • This due to the shape mismatch. Your `X[i]` is of shape `(7)` but the matrix that comes out of the categorical mapping `np_utils.to_categorical` gives a (7,5) matrix. That is, `encoded_col` is a vector having 7 elements probably with values ranging between 0-4. Which means once converted to categorical, it will be a matrix of (7,5) shape. So you might want to use a new variable to hold this new value of correct shape. – thushv89 Sep 06 '18 at 11:14
  • If I am reading your code right, I think `col = X[i]` selects row i; did you want to select column i? – from keras import michael Sep 06 '18 at 20:49
  • @fromkerasimportmichael Yup; only some of my columns are categorical, and I need to transform them via one-hot. – marked-down Sep 06 '18 at 21:11
  • @thushv89 If this is the case, how can I rebuild a numpy ndarray object that I can pass to Keras? I've tried instantiating a new list and then passing it to `numpy.array` without success. – marked-down Sep 06 '18 at 21:12
  • @ReactingToAngularVues Does `col = X[:, i]` give you what you want? – from keras import michael Sep 06 '18 at 21:14
  • @ReactingToAngularVues I've added an answer to what I think you want. Let me know if this is what you want, otherwise I'll edit the answer as you give inputs. – thushv89 Sep 06 '18 at 23:07
  • thanks for the responses from both of you; when I'm home tonight I'll give them both a go. – marked-down Sep 06 '18 at 23:08

1 Answers1

0

According to your question in the above comment, I've come up with a working solution. However, I'm still not sure what you are trying to achieve because I'm missing the context you are using this operation in. But let me take a stab anyway.

So each X row iterated has it's own label matrix given by the to_categorical operation. Therefore, I've designed Y (which I think is what you want) as a list. Then during iteration, I assign the newly created matrix to the corresponding element of the Y list.

# load dataset
dataframe = pandas.read_csv("data/development.csv")
dataset = dataframe.values

X = dataset[:,0:7].astype(int)
num_classes = np.max(X)

# Y is a list of matrices, one matrix for each row of X iterated below
Y = [None for _ in range(X.shape[0])]
encoder = LabelEncoder()

for i in [3,4,5,6]:
    col = X[i]
    # Get the number of classes present in that vector
    num_classes = np.max(X[i])

    encoder.fit(col)
    encoded_col = encoder.transform(col)
    # Set Y[i] to the new one-hot-encoded matrix
    Y[i] = np_utils.to_categorical(encoded_col) # Error is here

print([y.shape for y in Y if y is not None])
thushv89
  • 10,865
  • 1
  • 26
  • 39