5

I am presently using Keras' functional API to build a neural net that takes a mixture of numerical and categorical features. The quirk here is that every training sample may have multiple instances of a categorical variable present.

Therefore, a sample of the dataframe may look like this:

        sessions_sum      sessions_duration      cat_var_list     score
0          -0.554354                    100            [0, 1]       1.0
1          -0.553925                    200         [0, 2, 4]       1.0
2          -0.548787                    100            [3, 4]       0.0
3          -0.554354                    100               [5]       0.0
4          -0.553069                    100            [2, 5]       1.0

The cat_var_list column contains the a list of label-encoded categorical variables present for this training sample. I would like to create an embedding layer that takes the list of categorical indices, embeds them individually, and averages the embeddings before being concatenated with a Dense layer.

Here is the work-in-progress code that converts the data into numpy arrays and feeds them into the model.

# Prep data
x_train_numerics = modelDf[['sessions_sum', 'sessions_duration']].values
x_train_cats = modelDf['cat_var_list'].values
y_train = model['score'].values

# Begin model constructio 
numerics = keras.layers.Input(shape=[input_size])
layer_1 = keras.layers.Dense(64, activation='relu', name='layer1')(numerics)

cat_list = keras.layers.Input(shape=(None,), name = "subjectgroup_indices", dtype='int32')
embeddings = keras.layers.Embedding(input_dim=4, output_dim=10, input_length=None)(cat_list)
embeddings_avg = keras.layers.Lambda(lambda x: keras.backend.mean(x, axis=1))(embeddings)

hybrid_layer = keras.layers.Concatenate()([layer_1, embeddings_avg])
output_layer = keras.layers.Dense(1, kernel_initializer='lecun_uniform',
                                  name='output_layer')(hybrid_layer)
model = keras.models.Model(inputs=[numerics, cat_list], outputs=output_layer)
model.compile('adam', 'mean_absolute_error')
model.fit([x_train_numerics, x_train_cats], y_train, epochs=6, batch_size=200, validation_split=0.2)

Which gives me the following error when I run the fit method:

Traceback (most recent call last):
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-63-377ace5b4cf7>", line 1, in <module>
    model.fit([x_train_numerics, x_train_sgs], y_train, epochs=6, batch_size=200, validation_split=0.2)
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/keras/engine/training.py", line 1239, in fit
    validation_freq=validation_freq)
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
    outs = fit_function(ins_batch)
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3277, in __call__
    dtype=tensor_type.as_numpy_dtype))
  File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

I have tried setting the input shape of categorical list to None as suggested by the first answer to this question, but to no avail. Any assistance would be appreciated. Thanks!

Michael
  • 343
  • 3
  • 13

0 Answers0