0

I am trying to apply sequence-to-sequence modelling to EEG data. The encoding works just fine, but getting the decoding to work is proving problematic. The input-data has the shape None-by-3000-by-31, where the second dimension is the sequence-length.

The encoder looks like this:

initial_state = lstm_sequence_encoder.zero_state(batchsize, dtype=self.model_precision)

encoder_output, state = dynamic_rnn(
     cell=LSTMCell(32),
     inputs=lstm_input, # shape=(None,3000,32)
     initial_state=initial_state, # zeroes
     dtype=lstm_input.dtype # tf.float32
)

I use the final state of the RNN as the initial state of the decoder. For training, I use the TrainingHelper:

training_helper = TrainingHelper(target_input, [self.sequence_length])
training_decoder = BasicDecoder(
     cell=lstm_sequence_decoder,
     helper=training_helper,
     initial_state=thought_vector
)
output, _, _ = dynamic_decode(
     decoder=training_decoder,
     maximum_iterations=3000
)

My troubles start when I try to implement inference. Since I am using non-sentence data, I do not need to tokenize or embed, because the data is essentially embedded already. The InferenceHelper class seemed the best way to achieve my goal. So this is what I use. I'll give my code then explain my problem.

def _sample_fn(decoder_outputs):
     return decoder_outputs
def _end_fn(_):
     return tf.tile([False], [self.lstm_layersize]) # Batch-size is sequence-length because of time major
inference_helper = InferenceHelper(
     sample_fn=_sample_fn,
     sample_shape=[32],
     sample_dtype=target_input.dtype,
     start_inputs=tf.zeros(batchsize_placeholder, 32), # the batchsize varies
     end_fn=_end_fn
)
inference_decoder = BasicDecoder(
     cell=lstm_sequence_decoder,
     helper=inference_helper,
     initial_state=thought_vector
)
output, _, _ = dynamic_decode(
     decoder=inference_decoder,
     maximum_iterations=3000
)

The Problem

I don't know what the shape of the inputs should be. I know the start-inputs should be zero because it is the first time-step. But this throws errors; it expects the input to be (1,32).

I also thought I should pass the output of each time-step unchanged to the next. However, this raises problems at run-time: the batch-size varies, so the shape is partial. The library throws an exception at this as it tries to convert the start_input to a tensor:

...
self._start_inputs = ops.convert_to_tensor(
      start_inputs, name='start_inputs')

Any ideas?

MPKenning
  • 569
  • 1
  • 7
  • 22

1 Answers1

0

This is a lesson in poor documentation.

I fixed my problem, but failed to address the variable batch-size problem.

The _end_fn was causing problems I was unaware of. I also managed to work out what the appropriate fields are for the InferenceHelper. I've given the fields names in case anyone needs guidance in future

 def _end_fn(_):
      return tf.tile([False], [batchsize])
 inference_helper = InferenceHelper(
      sample_fn=_sample_fn,
      sample_shape=[lstm_number_of_units], # In my case, 32
      sample_dtype=tf.float32, # Depends on the data
      start_inputs=tf.zeros((batchsize, lstm_number_of_units)),
      end_fn=_end_fn
 )

As for the batch-size problem, there are two things I'm considering:

  1. Changing the internal state of my model object. My TensorFlow computation graph is built inside a class. A class-field records the batch-size. Changing this during training may work. Or:

  2. Pad the batches so that they are 200 sequences long. This will waste time.

Preferably I'd like a way to dynamically manage the batch-sizes.

EDIT: I found a way. It involves simply substituting square-brackets for parentheses:

 inference_helper = InferenceHelper(
      sample_fn=_sample_fn,
      sample_shape=[self.lstm_layersize],
      sample_dtype=target_input.dtype,
      start_inputs=tf.zeros([batchsize, self.lstm_layersize]),
      end_fn=_end_fn
 )
MPKenning
  • 569
  • 1
  • 7
  • 22