I am struggling with the following. I am creating a tf.data.Dataset using the from_generator method. I perform these actions on CPU as I don't want to overload my GPU memory.
The dataset consists of tuples, which contain a tf.bool 1-D mask (tf.Tensor) with fixed length, and a tf.float 2-D matrix (tf.Tensor) with variable size. The loss function is decorated using the following decorator, so I would not assume the variable size is the problem.
@tf.function(experimental_relax_shapes=True)
Ideally, the dataset is kept on the CPU, but then prefetched onto the GPU.
def gen():
for i, j in zip(mask_list, wmat_list):
yield i, j
dataset = tf.data.Dataset.from_generator(gen, output_types=(tf.bool, tf.float32))
The main training loop currently relies on tf.identity to move the data to the gpu, which is inefficient. As shown on the screenshot from Tensorboard below. Roughly 70% of the time is spend loading the data and moving it to GPU.
for b, (mask, wmat) in enumerate(dataset):
with tf.GradientTape() as tape:
mask = tf.identity(mask)
wmat = tf.identity(wmat)
mean_error, loss = self.model.loss(mask, wmat)
epoch_loss += loss.numpy()
epoch_mean_error += mean_error.numpy()
I have tried the "prefetch_to_device" function. However, it did not move the data onto the GPU. As verified by printing e.g. mask.device in the training loop.
gpu_transform = tf.data.experimental.prefetch_to_device('/gpu')
dataset.apply(gpu_transform)
For me it resembles to this bug: https://github.com/tensorflow/tensorflow/issues/30929 . However, it is marked as solved and is over a year old.
Running TF 2.3 using the official Docker image.