Inferencing in encoder-decoder teacher forcing model with floating point values

Question

I am trying to model a translation between two numerical (floating point) datasets and thought of using sequence to sequence learning with teaching enforcement. I am able to run the training model with a decently low mse but when it comes to the inference model, my outputs are really off from the target data or maybe I am inferencing incorrectly. My question is how can we inference floating type values? On the internet, I can find several tutorials which one-hot encode integer type data and draw inference in form of an one-hot encoded vector and then decode it to the predicted integer. But, how can I carry out the same with my data?

My both the datasets are numeric with floating points. encoder input data =

array([[0.        ],
       [0.00075804],
       [0.00024911],
       ...,
       [0.        ],
       [0.        ],
       [0.        ]])

I am using a masking layer with 0 as a the start/stop character because my encoder dataset consists of 4096 time steps per sample.

My decoder output data =

array([[0.04930792],
       [0.0509621 ],
       [0.05045872],
       ...,
       [0.02535375],
       [0.02148524],
       [0.02867743]], dtype=float32)

Decoder data consists of 8192 time steps per sample.

My decoder input data =

array([[0.        ],
       [0.04930792],
       [0.0509621 ],
       ...,
       [0.01980789],
       [0.02535375],
       [0.02148524]], dtype=float32)

Decoder also consists of 8192 time steps per sample.

My train model architecture:

encoder_inputs= Input(shape=(max_input_sequence, input_dimension), name='encoder_inputs')

masking = Masking(mask_value= 0)
encoder_inputs_masked = masking(encoder_inputs)

encoder_lstm=LSTM(LSTMoutputDimension,activation='elu', return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_inputs_masked)


encoder_states = [state_h, state_c]



decoder_inputs = Input(shape=(None, input_dimension), name='decoder_inputs')
decoder_lstm = LSTM(LSTMoutputDimension, activation='elu', return_sequences=True, return_state=True, name='decoder_lstm')

# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)


decoder_dense = Dense(input_dimension ,name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)

# put together
model_encoder_training = Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')

opt = Adam(lr=0.007, clipnorm=1)
model_encoder_training.compile(optimizer=opt, loss='mean_squared_error', metrics=['mse'])

My inference model architecture:

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(LSTMoutputDimension,))
decoder_state_input_c = Input(shape=(LSTMoutputDimension,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, input_dimension))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, 0] = 0
    # target_seq = 0

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_seq = list()
    while not stop_condition:

        # in a loop
        # decode the input to a token/output prediction + required states for context vector
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # convert the token/output prediction to a token/output
        # sampled_token_index = np.argmax(output_tokens[0, -1, :])
        # sampled_digit = sampled_token_index
        # add the predicted token/output to output sequence
        decoded_seq.append(output_tokens)
        

        # Exit condition: either hit max length
        # or find stop character.
        if (
           len(decoded_seq) == max_input_sequence):
            stop_condition = True

        # Update the input target sequence (of length 1) 
        # with the predicted token/output 
        # target_seq = np.zeros((1, 1, input_dimension))
        # target_seq[0, 0, sampled_token_index] = 1.
        target_seq = output_tokens

        # Update input states (context vector) 
        # with the ouputed states
        states_value = [h, c]

        # loop back.....
        
    # when loop exists return the output sequence
    return decoded_seq

sampleNo =  1
# for sample in range(0,sampleNo):
for sample in range(0,sampleNo):
  predicted= decode_sequence(encoder_input_data[sample].reshape(1,max_input_sequence,input_dimension))
  # store.append(predicted)

So far, I have tried playing out with different activation functions for the Dense layer output but to no luck, nothing seems to be working the way I expect it to. Any sort of suggestions or help will be greatly appreciated!

Inferencing in encoder-decoder teacher forcing model with floating point values

0 Answers0