Training seq2seq model on Google Colab TPU with big dataset - Keras

Question

I'm trying to train a sequence to sequence model for machine translation using Keras on Google Colab TPU. I have a dataset which I can load in memory but I have to preprocess to it to feed it to the model. In particular I need to convert the target words to one hot vectors and with many examples I can't load the entire conversion in memory, so I need to make batches of data.

I'm using this function as a batch generator:

def generate_batch_bert(X_ids, X_masks, y, batch_size = 1024):
    ''' Generate a batch of data '''
    while True:
        for j in range(0, len(X_ids), batch_size):
          # batch of encoder and decoder data
          encoder_input_data_ids = X_ids[j:j+batch_size]
          encoder_input_data_masks = X_masks[j:j+batch_size]
          y_decoder = y[j:j+batch_size]
          

          # decoder target and input for teacher forcing
          decoder_input_data = y_decoder[:,:-1]
          decoder_target_seq = y_decoder[:,1:]
          
          # batch of decoder target data
          decoder_target_data = to_categorical(decoder_target_seq, vocab_size_fr)
          # keep only with the right amount of instances for training on TPU
          if encoder_input_data_ids.shape[0] == batch_size:
            yield([encoder_input_data_ids, encoder_input_data_masks, decoder_input_data], decoder_target_data)

The problem is that whenever I try to run the fit function as follows:

model.fit(x=generate_batch_bert(X_train_ids, X_train_masks, y_train, batch_size = batch_size),
                    steps_per_epoch = train_samples//batch_size,
                    epochs=epochs,
                    callbacks = callbacks,
                    validation_data = generate_batch_bert(X_val_ids, X_val_masks, y_val, batch_size = batch_size),
                    validation_steps = val_samples//batch_size)

I get the following error:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:445 make_tensor_proto
    raise ValueError("None values not supported.")

ValueError: None values not supported.

Not sure what's wrong and how I can solve this problem.

EDIT

I tried loading less amount of data in memory so that the conversion to one hot encoding of the target words doesn't crash the kernel and it actually works. So there is obviously something wrong on how I generate batches.

there were a lot of troubles with making generators work with TPU https://github.com/tensorflow/tensorflow/issues/32712. I didn't manage get generators working. And always use files — Andrey, Feb 13 '21 at 21:03
to clarify based on your edit: this code works if using a small amount of data but if you use a larger amount of data, it crashes with `ValueError: None values not supported.`? If that's the case, is it possible there is a bad item of data somewhere in the dataset and you don't come across it while using a subset of the dataset — Zachary Cain, Feb 23 '21 at 01:26

score 1 · Answer 1 · answered Mar 14 '21 at 05:52

It's hard to tell what's wrong since you don't provide your model definition nor any sample data. However, I'm fairly certain that you're running into the same TensorFlow bug that I recently got bitten by.

The workaround is to use the tensorflow.data API which works much better with TPUs. Like this:

from tensorflow.data import Dataset
import tensorflow as tf

def map_fn(X_id, X_mask, y):
    decoder_target_data = tf.one_hot(y[1:], vocab_size_fr)
    return (X_id, X_mask, y[:-1]), decoder_target_data
...
X_ids = Dataset.from_tensor_slices(X_ids)
X_masks = Dataset.from_tensor_slices(X_masks)
y = Dataset.from_tensor_slices(y)
ds = Dataset.zip((X_ids, X_masks, y)).map(map_fn).batch(1024)
model.fit(x = ds, ...)

Training seq2seq model on Google Colab TPU with big dataset - Keras

1 Answers1

Linked