I am trying to model a translation between two numerical (floating point) datasets and thought of using sequence to sequence learning with teaching enforcement. I am able to run the training model with a decently low mse but when it comes to the inference model, my outputs are really off from the target data or maybe I am inferencing incorrectly. My question is how can we inference floating type values? On the internet, I can find several tutorials which one-hot encode integer type data and draw inference in form of an one-hot encoded vector and then decode it to the predicted integer. But, how can I carry out the same with my data?
My both the datasets are numeric with floating points. encoder input data =
array([[0. ],
[0.00075804],
[0.00024911],
...,
[0. ],
[0. ],
[0. ]])
I am using a masking layer with 0 as a the start/stop character because my encoder dataset consists of 4096 time steps per sample.
My decoder output data =
array([[0.04930792],
[0.0509621 ],
[0.05045872],
...,
[0.02535375],
[0.02148524],
[0.02867743]], dtype=float32)
Decoder data consists of 8192 time steps per sample.
My decoder input data =
array([[0. ],
[0.04930792],
[0.0509621 ],
...,
[0.01980789],
[0.02535375],
[0.02148524]], dtype=float32)
Decoder also consists of 8192 time steps per sample.
My train model architecture:
encoder_inputs= Input(shape=(max_input_sequence, input_dimension), name='encoder_inputs')
masking = Masking(mask_value= 0)
encoder_inputs_masked = masking(encoder_inputs)
encoder_lstm=LSTM(LSTMoutputDimension,activation='elu', return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs_masked)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, input_dimension), name='decoder_inputs')
decoder_lstm = LSTM(LSTMoutputDimension, activation='elu', return_sequences=True, return_state=True, name='decoder_lstm')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(input_dimension ,name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
# put together
model_encoder_training = Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')
opt = Adam(lr=0.007, clipnorm=1)
model_encoder_training.compile(optimizer=opt, loss='mean_squared_error', metrics=['mse'])
My inference model architecture:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(LSTMoutputDimension,))
decoder_state_input_c = Input(shape=(LSTMoutputDimension,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, input_dimension))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, 0] = 0
# target_seq = 0
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_seq = list()
while not stop_condition:
# in a loop
# decode the input to a token/output prediction + required states for context vector
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# convert the token/output prediction to a token/output
# sampled_token_index = np.argmax(output_tokens[0, -1, :])
# sampled_digit = sampled_token_index
# add the predicted token/output to output sequence
decoded_seq.append(output_tokens)
# Exit condition: either hit max length
# or find stop character.
if (
len(decoded_seq) == max_input_sequence):
stop_condition = True
# Update the input target sequence (of length 1)
# with the predicted token/output
# target_seq = np.zeros((1, 1, input_dimension))
# target_seq[0, 0, sampled_token_index] = 1.
target_seq = output_tokens
# Update input states (context vector)
# with the ouputed states
states_value = [h, c]
# loop back.....
# when loop exists return the output sequence
return decoded_seq
sampleNo = 1
# for sample in range(0,sampleNo):
for sample in range(0,sampleNo):
predicted= decode_sequence(encoder_input_data[sample].reshape(1,max_input_sequence,input_dimension))
# store.append(predicted)
So far, I have tried playing out with different activation functions for the Dense layer output but to no luck, nothing seems to be working the way I expect it to. Any sort of suggestions or help will be greatly appreciated!