3

versions: Python 3.6.9, Tensorflow 2.0.0, CUDA 10.0, CUDNN 7.6.1, Nvidia driver version 410.78.

I'm trying to port a LSTM-based Seq2Seq tf.keras model to tensorflow 2.0

Right now I'm facing the following error when I try to call predict on the decoder model (see below for the actual inference setup code)

It is as if it were expecting a single word as argument, but I need it to decode a full sentence (my sentences are right-padded sequences of word indices, of length 24)

P.S.: This code used to work exactly as it is on TF 1.15

InvalidArgumentError:  [_Derived_]  Inputs to operation while/body/_1/Select_2 of type Select must have the same size and shape.
Input 0: [1,100] != input 1: [24,100]
     [[{{node while/body/_1/Select_2}}]]
     [[lstm_1_3/StatefulPartitionedCall]] [Op:__inference_keras_scratch_graph_45160]

Function call stack:
keras_scratch_graph -> keras_scratch_graph -> keras_scratch_graph

FULL MODEL

enter image description here

ENCODER inference model

enter image description here

DECODER inference model

enter image description here

Inference Setup (line where error actually happens)

Important information: sequences are right-padded to 24 elements and 100 is the number of dimensions for each word embedding. This is why the error message (and the prints) show that the input shapes are (24,100).

note that this code runs on a CPU. running it on a GPU leads to another error detailed here

# original_keyword is a sample text string

with tf.device("/device:CPU:0"):

    # this method turns the raw string into a right-padded sequence
    query_sequence = keyword_to_padded_sequence_single(original_keyword)

    # no problems here
    initial_state = encoder_model.predict(query_sequence)

    print(initial_state[0].shape) # prints (24, 100)
    print(initial_state[1].shape) # (24, 100)

    empty_target_sequence = np.zeros((1,1))

    empty_target_sequence[0,0] = word_dict_titles["sos"]

    # ERROR HAPPENS HERE:
    # InvalidArgumentError:  [_Derived_]  Inputs to operation while/body/_1/Select_2 of type Select 
    # must have the same size and shape.  Input 0: [1,100] != input 1: [24,100]
    decoder_outputs, h, c = decoder_model.predict([empty_target_sequence] + initial_state)

Things I have tried

  • disabling eager mode (this just made training much slower and the error during inference stayed the same)

  • reshaping the input prior to feeding it to the predict function

  • manually computing (embedding_layer.compute_mask(inputs)) and setting masks when calling the LSTM layers

Felipe
  • 11,557
  • 7
  • 56
  • 103
  • How do you build your layers? Did you set return_sequences=True in your LSTM layer? – emirc Nov 21 '19 at 01:14
  • @emirc for the encoder it's `False`. For the decoder, it's `True`. Here's the full code: https://gist.github.com/queirozfcom/20d76e3113c649660df8dc1e59455680 – Felipe Nov 21 '19 at 01:22
  • Hi, can you try to change the `decoder_inputs` shape to `decoder_inputs = tf.keras.layers.Input(shape=(None,),name="decoder_input")`. The error is coming because `empty_target_sequence` has a shape `(1,1)` while your decoder expects an input of shape `(?,24)`. – Siddhant Tandon Nov 26 '19 at 13:50

1 Answers1

1

From what I can see from your model architecture, the initial_state is an array of tensors with shapes: [(?, 100), (?, 100), (?, 100)]. In your case the unknown dimension is fixed to 24.

Then, you build a Numpy array/TF tensor of shape (1, 1). You wrap it inside a list and append your initial_state. Hence you get a list of tensors with shapes: [(1, 1), (?, 100), (?, 100), (?, 100)].

You try to pass it as an input to your decoder model which expect 3 inputs (a list of inputs) with shapes [(?, 24), (?, 100), (?, 100)].

Starting from that it seems there is something wrong...

However, TF complains about the inputs of the operation while/body/_1/Select_2. The input 1 should come from any of your initial_state tensor (which we know has a shape (24, 100)). The input 2 seems to come from your empty_target_sequence that has a shape (1, 1) which can be broadcasted to (1, 100). By the way, it is strange that it is not broadcasted to (24, 100) as both dimensions are of size 1...

I would recommend to check your graph in TensorBoard. You should be able to find the messy operation and track its input tensors.

AlexisBRENON
  • 2,921
  • 2
  • 18
  • 30