I am presently using Keras' functional API to build a neural net that takes a mixture of numerical and categorical features. The quirk here is that every training sample may have multiple instances of a categorical variable present.
Therefore, a sample of the dataframe may look like this:
sessions_sum sessions_duration cat_var_list score
0 -0.554354 100 [0, 1] 1.0
1 -0.553925 200 [0, 2, 4] 1.0
2 -0.548787 100 [3, 4] 0.0
3 -0.554354 100 [5] 0.0
4 -0.553069 100 [2, 5] 1.0
The cat_var_list
column contains the a list of label-encoded categorical variables present for this training sample. I would like to create an embedding layer that takes the list of categorical indices, embeds them individually, and averages the embeddings before being concatenated with a Dense layer.
Here is the work-in-progress code that converts the data into numpy arrays and feeds them into the model.
# Prep data
x_train_numerics = modelDf[['sessions_sum', 'sessions_duration']].values
x_train_cats = modelDf['cat_var_list'].values
y_train = model['score'].values
# Begin model constructio
numerics = keras.layers.Input(shape=[input_size])
layer_1 = keras.layers.Dense(64, activation='relu', name='layer1')(numerics)
cat_list = keras.layers.Input(shape=(None,), name = "subjectgroup_indices", dtype='int32')
embeddings = keras.layers.Embedding(input_dim=4, output_dim=10, input_length=None)(cat_list)
embeddings_avg = keras.layers.Lambda(lambda x: keras.backend.mean(x, axis=1))(embeddings)
hybrid_layer = keras.layers.Concatenate()([layer_1, embeddings_avg])
output_layer = keras.layers.Dense(1, kernel_initializer='lecun_uniform',
name='output_layer')(hybrid_layer)
model = keras.models.Model(inputs=[numerics, cat_list], outputs=output_layer)
model.compile('adam', 'mean_absolute_error')
model.fit([x_train_numerics, x_train_cats], y_train, epochs=6, batch_size=200, validation_split=0.2)
Which gives me the following error when I run the fit method:
Traceback (most recent call last):
File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-63-377ace5b4cf7>", line 1, in <module>
model.fit([x_train_numerics, x_train_sgs], y_train, epochs=6, batch_size=200, validation_split=0.2)
File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/keras/engine/training.py", line 1239, in fit
validation_freq=validation_freq)
File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
outs = fit_function(ins_batch)
File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3277, in __call__
dtype=tensor_type.as_numpy_dtype))
File "/anaconda3/envs/recommendations/lib/python3.7/site-packages/numpy/core/numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
I have tried setting the input shape of categorical list to None
as suggested by the first answer to this question, but to no avail. Any assistance would be appreciated. Thanks!