Using tf.data.Dataset as training input to Keras model NOT working

Question

I have a simple code, which DOES work, for training a Keras model in Tensorflow using numpy arrays as features and labels. If I then wrap these numpy arrays using tf.data.Dataset.from_tensor_slices in order to train the same Keras model using a tensorflow dataset, I get an error. I haven't been able to figure out why (it may be a tensorflow or keras bug, but I may also be missing something). I'm on python 3, tensorflow is 1.10.0, numpy is 1.14.5, no GPU involved.

OBS1: The possibility of using tf.data.Dataset as a Keras input is showed in https://www.tensorflow.org/guide/keras, under "Input tf.data datasets".

OBS2: In the code below, the code under "#Train with numpy arrays" is being executed, using numpy arrays. If this code is commented and the code under "#Train with tf.data datasets" is used instead, the error will be reproduced.

OBS3: In line 13, which is commented and starts with "###WORKAROUND 1###", if the comment is removed and the line is used for tf.data.Dataset inputs, the error changes, even though I can't completely understand why.

The complete code is:

import tensorflow as tf
import numpy as np

np.random.seed(1)
tf.set_random_seed(1)

print(tf.__version__)
print(np.__version__)

#Import mnist dataset as numpy arrays
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()#Import
x_train, x_test = x_train / 255.0, x_test / 255.0 #normalizing
###WORKAROUND 1###y_train, y_test = (y_train.astype(dtype='float32'), y_test.astype(dtype='float32'))

x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1]*x_train.shape[2])) #reshaping 28 x 28 images to 1D vectors, similar to Flatten layer in Keras

batch_size = 32
#Create a tf.data.Dataset object equivalent to this data
tfdata_dataset_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
tfdata_dataset_train = tfdata_dataset_train.batch(batch_size).repeat()

#Creates model
keras_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2, seed=1),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

#Compile the model
keras_model.compile(optimizer='adam',
                    loss=tf.keras.losses.sparse_categorical_crossentropy,
                    metrics=['accuracy'])

#Train with numpy arrays
keras_training_history = keras_model.fit(x_train,
                y_train,
                initial_epoch=0,
                epochs=1,
                batch_size=batch_size
                )

#Train with tf.data datasets
#keras_training_history = keras_model.fit(tfdata_dataset_train,
#                initial_epoch=0,
#                epochs=1,
#                steps_per_epoch=60000//batch_size
#                )

print(keras_training_history.history)

The error observed when using tf.data.Dataset as input is:

(...)
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32: 'Tensor("metrics/acc/Cast:0", shape=(?,), dtype=float32)'

During handling of the above exception, another exception occurred:

(...)
TypeError: Input 'y' of 'Equal' Op has type float32 that does not match type uint8 of argument 'x'.

The error when removing the comment from line 13, as commented above in OBS3, is:

(...)
tensorflow.python.framework.errors_impl.InvalidArgumentError: In[0] is not a matrix
     [[Node: dense/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_sequential_input_0_0, dense/MatMul/ReadVariableOp)]]

Any help would be appreciated, including comments that you were able to reproduce the errors, so I can report the bug if it is the case.

Mohan Radhakrishnan · Answer 1 · 2018-09-08T13:53:14.373

I just upgraded to Tensorflow 1.10 to execute this code. I think that is the answer which is also discussed in the other Stackoverflow thread

This code executes but only if I remove the normalization as that line seems to use too much CPU memory. I see messages indicating that. I also reduced the cores.

import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, Input

np.random.seed(1)
tf.set_random_seed(1)

batch_size = 128
NUM_CLASSES = 10

print(tf.__version__)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
#x_train, x_test = x_train / 255.0, x_test / 255.0 #normalizing

def tfdata_generator(images, labels, is_training, batch_size=128):
    '''Construct a data generator using tf.Dataset'''

    def preprocess_fn(image, label):
        '''A transformation function to preprocess raw data
        into trainable input. '''
        x = tf.reshape(tf.cast(image, tf.float32), (28, 28, 1))
        y = tf.one_hot(tf.cast(label, tf.uint8), NUM_CLASSES)
        return x, y

    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    if is_training:
        dataset = dataset.shuffle(1000)  # depends on sample size

    # Transform and batch data at the same time
    dataset = dataset.apply(tf.contrib.data.map_and_batch(
        preprocess_fn, batch_size,
        num_parallel_batches=2,  # cpu cores
        drop_remainder=True if is_training else False))
    dataset = dataset.repeat()
    dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

    return dataset

training_set = tfdata_generator(x_train, y_train,is_training=True, batch_size=batch_size)
testing_set  = tfdata_generator(x_test, y_test, is_training=False, batch_size=batch_size)

inputs = Input(shape=(28, 28, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='valid')(inputs)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(NUM_CLASSES, activation='softmax')(x)

keras_model =  tf.keras.Model(inputs, outputs)

#Compile the model
keras_model.compile('adam', 'categorical_crossentropy', metrics=['acc'])

#Train with tf.data datasets
keras_training_history = keras_model.fit(
                            training_set.make_one_shot_iterator(),
                            steps_per_epoch=len(x_train) // batch_size,
                            epochs=5,
                            validation_data=testing_set.make_one_shot_iterator(),
                            validation_steps=len(x_test) // batch_size,
                            verbose=1)
print(keras_training_history.history)

+1 for the answer, it did add material to my question, for example showing an example which works. But it does not solve the problem. I found the issue and workaround documented in github issues and will post here. Thanks a lot! — Elisio Quintino, Sep 07 '18 at 16:26
The code I posted executes. But I used same model. Didn't attempt to debug with your model but there was an error with yours. — Mohan Radhakrishnan, Sep 08 '18 at 14:13
Yes I get it, and as I said your answer helped a lot by providing a code that executes. But my question is providing my code, showing that it works with numpy arrays, and then showing that the same code does not work with tf.data pipelines. And then I ask why, and where the error in the code is. So the answer for this question should point the cause for the error in MY code. But again, your answer did help with material, so thanks a lot! As I commented with my answer, one part of the problem was due to a bug in tensorflow, which is being handled and should be solved in 1.11. — Elisio Quintino, Sep 10 '18 at 06:58
I am wondering how Keras is able to do 5 epochs when the make_one_shot_iterator() which only supports iterating once through a dataset? — SantoshGupta7, Mar 31 '19 at 19:11

score 0 · Accepted Answer · answered Sep 07 '18 at 16:32

Installing the tf-nightly build, together with changing dtypes of some tensors (the error changes after installing tf-nightly), solved the problem, so it is an issue which (hopefully) will be solved in 1.11.

Related material: https://github.com/tensorflow/tensorflow/issues/21894

JeeyCi · Answer 3 · 2022-04-18T17:18:26.607

I am wondering how Keras is able to do 5 epochs when the make_one_shot_iterator() which only supports iterating once through a dataset?

could be given smth like iterations = len(y_train) * epochs - here shown for tf.v1

the code from Mohan Radhakrishnan still works in tf.v2 with little corrections in objects' belongings to new classes (in tf.v2) fixings - to make the code up-to-date... No more make_one_shot_iterator() needed

# >> author: Mohan Radhakrishnan

import tensorflow as tf
import tensorflow.keras
import numpy as np
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, Input

np.random.seed(1)
tf.random.set_seed(1)

batch_size = 128
NUM_CLASSES = 10

print(tf.__version__)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
#x_train, x_test = x_train / 255.0, x_test / 255.0 #normalizing

def tfdata_generator(images, labels, is_training, batch_size=128):
    '''Construct a data generator using tf.Dataset'''

    def preprocess_fn(image, label):
        '''A transformation function to preprocess raw data
        into trainable input. '''
        x = tf.reshape(tf.cast(image, tf.float32), (28, 28, 1))
        y = tf.one_hot(tf.cast(label, tf.uint8), NUM_CLASSES)
        return x, y

    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    if is_training:
        dataset = dataset.shuffle(1000)  # depends on sample size

    # Transform and batch data at the same time
    dataset = dataset.apply( tf.data.experimental.map_and_batch(
        preprocess_fn, batch_size,
        num_parallel_batches=2,  # cpu cores
        drop_remainder=True if is_training else False))
    dataset = dataset.repeat()
    dataset = dataset.prefetch( tf.data.experimental.AUTOTUNE)

    return dataset

training_set = tfdata_generator(x_train, y_train,is_training=True, batch_size=batch_size)
testing_set  = tfdata_generator(x_test, y_test, is_training=False, batch_size=batch_size)

inputs = Input(shape=(28, 28, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='valid')(inputs)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(NUM_CLASSES, activation='softmax')(x)

keras_model =  tf.keras.Model(inputs, outputs)

#Compile the model
keras_model.compile('adam', 'categorical_crossentropy', metrics=['acc'])

#Train with tf.data datasets
# training_set.make_one_shot_iterator() - 'PrefetchDataset' object has no attribute 'make_one_shot_iterator'
keras_training_history = keras_model.fit(
                            training_set,
                            steps_per_epoch=len(x_train) // batch_size,
                            epochs=5,
                            validation_data=testing_set,
                            validation_steps=len(x_test) // batch_size,
                            verbose=1)
print(keras_training_history.history)

not loading data locally, just easy DataFlow - that is very convinient - Thanks a lot - hope my corrections are proper

Using tf.data.Dataset as training input to Keras model NOT working

3 Answers3

Linked