I am beginner in RNNs and would like to build a model gated recurrent unit GRU for predicting a user's action on an E-commerce website called google merchandize store that sells Google branded merchandise.
We have 5 different actions:
Add to cart
Quickview click
Product click
Remove from cart
Onsite click
My data_y which the target looks like this as we have different actions
array([[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
...,
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0]], dtype=uint8)
By using only the url or the page path the user has accessed, I have achieved 68% prediction accuracy but still trying to improve it by adding another inputs to the model.
My data_X looks like
pagePath
[googleredesign, bags]
[googleredesign, bags]
[googleredesign, electronics]
...
...
[googleredesign, bags, backpacks, home]
[googleredesign, bags, backpacks, googlealpine...
53087 rows × 2 columns
After getting the vocab length and the max sequence length I tokenized it
tokenizer = Tokenizer(num_words=vocab_length)
tokenizer.fit_on_texts(data_X['pagePath'])
sequences = tokenizer.texts_to_sequences(data_X['pagePath'])
word_index = tokenizer.word_index
model_inputs = pad_sequences(sequences, maxlen=max_seq_length)
data_X=model_inputs
That's how it looks like after tokenization
array([[ 0, 0, 0, 1, 3],
[ 0, 0, 0, 1, 3],
[ 0, 0, 0, 1, 3],
...,
[ 0, 1, 3, 12, 9],
[ 0, 1, 3, 12, 9],
[ 0, 1, 3, 12, 81]], dtype=int32)
After that I have splitted that data and trained the model
X_train, X_test, y_train, y_test = train_test_split(data_X, data_y, test_size=0.3,
random_state=2)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(37160, 5) (15927, 5) (37160, 5) (15927, 5)
embedding_dim = 64
inputs = tf.keras.Input(shape=(max_seq_length,))
embedding = tf.keras.layers.Embedding(
input_dim=vocab_length,
output_dim=embedding_dim,
input_length=max_seq_length
)(inputs)
gru = tf.keras.layers.GRU(units=embedding_dim)(embedding)
outputs = tf.keras.layers.Dense(5, activation='sigmoid')(gru)
model = tf.keras.Model(inputs, outputs)
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=[
'accuracy',
tf.keras.metrics.AUC(name='auc')
]
)
batch_size = 32
epochs = 3
history = model.fit(
X_train,
y_train,
validation_split=0.2,
batch_size=batch_size,
epochs=epochs,
callbacks=[
tf.keras.callbacks.ReduceLROnPlateau(),
tf.keras.callbacks.ModelCheckpoint('model.h5', save_best_only=True)
]
)
So my question is how to add another input to the model for example: if I want to add a column which represents the total time the user spent on the website. How to add it with the embedding layer and it is not tokenized and unrelated to the pagePath column which is tokenized?