21

So I have got my keras model to work with a tf.Dataset through the following code:

# Initialize batch generators(returns tf.Dataset)
batch_train = build_features.get_train_batches(batch_size=batch_size)

# Create TensorFlow Iterator object
iterator = batch_train.make_one_shot_iterator()
dataset_inputs, dataset_labels = iterator.get_next()

# Create Model
logits = .....(some layers)
keras.models.Model(inputs=dataset_inputs, outputs=logits)

# Train network
model.compile(optimizer=train_opt, loss=model_loss, target_tensors=[dataset_labels])
model.fit(epochs=epochs, steps_per_epoch=num_batches, callbacks=callbacks, verbose=1)

however when I try to pass validation_data parameter to the model. fit it tells me that I cannot use it with the generator. Is there a way to use validation while using tf.Dataset

for example in tensorflow I could do the following:

# initialize batch generators
batch_train = build_features.get_train_batches(batch_size=batch_size)
batch_valid = build_features.get_valid_batches(batch_size=batch_size)

# create TensorFlow Iterator object
iterator = tf.data.Iterator.from_structure(batch_train.output_types,
                                           batch_train.output_shapes)

# create two initialization ops to switch between the datasets
init_op_train = iterator.make_initializer(batch_train)
init_op_valid = iterator.make_initializer(batch_valid)

then just use sess.run(init_op_train) and sess.run(init_op_valid) to switch between the datasets

I tried implementing a callback that does just that (switch to validation set, predict and back) but it tells me I can't use model.predict in a callback

can someone help me get validation working with Keras+Tf.Dataset

edit: incorporate answer into the code

so FINALLY what worked for me, thanks to the selected answer is:

# Initialize batch generators(returns tf.Dataset)
batch_train = # returns tf.Dataset
batch_valid = # returns tf.Dataset

# Create TensorFlow Iterator object and wrap it in a generator
itr_train = make_iterator(batch_train)
itr_valid = make_iterator(batch_train)

# Create Model
logits = # the keras model
keras.models.Model(inputs=dataset_inputs, outputs=logits)

# Train network
model.compile(optimizer=train_opt, loss=model_loss, target_tensors=[dataset_labels])
model.fit_generator(
    generator=itr_train, validation_data=itr_valid, validation_steps=batch_size,
    epochs=epochs, steps_per_epoch=num_batches, callbacks=cbs, verbose=1, workers=0)

def make_iterator(dataset):
    iterator = dataset.make_one_shot_iterator()
    next_val = iterator.get_next()

    with K.get_session().as_default() as sess:
        while True:
            *inputs, labels = sess.run(next_val)
            yield inputs, labels

This doesn't introduce any overhead

Mark Rofail
  • 808
  • 1
  • 8
  • 18
  • 2
    After your change, how do you get dataset_inputs into model? I'm not getting how line keras.models.Model(inputs=dataset_inputs, outputs=logits), and i'm assuming this is the contents of the "model" variable, could you please complete the code, i have the exact same problem but can't seem to know how to apply your code, thanks in advance – josesuero Feb 21 '19 at 23:39
  • @mark rofail, I believe this line is incorrect and should receive batch_**valid**: itr_valid = make_iterator(batch_train) – Robert Lugg Mar 04 '20 at 19:05

2 Answers2

3

I solved the problem by using fit_genertor. I found the solution here. I applied @Dat-Nguyen's solution.

You need simply to create two iterators, one for training and one for validation and then create your own generator where you will extract batches from the dataset and provide the data in form of (batch_data, batch_labels) . Finally in model.fit_generator you will pass the train_generator and validation_generator.

W. Sam
  • 818
  • 1
  • 7
  • 21
  • so I have to wrap tensorflow iterators in a python generator like: `iterator = ds.make_one_shot_iterator() while True: next_val = iterator.get_next() yield sess.run(next_val)` – Mark Rofail Jun 22 '18 at 12:18
  • Hi, It is me this time asking you :). I am facing now another problem with fit_genertor which is I can get access to validation data. For example you want to evaluate the value of prediction at batch level, in order to accumulated them and then calculate the prediction for the whole epoch in order to use it for AUC metric. DO you have any idea how we can accomplish this? or I should open a new post for it. – W. Sam Jun 26 '18 at 20:13
  • please open a new question regardless, so others would benefit. – Mark Rofail Jun 27 '18 at 11:27
  • to my knowledge you can not access minibatch metrics, however definig a custom loss function and including it in the metrics when you compile the model should do just that. Keras should give you the average auc per epoch this is the auc loss function I came up with: `from sklearn.metrics import roc_auc_score def roc_auc(y_true, y_pred): return roc_auc_score(y_true, y_pred)` – Mark Rofail Jun 27 '18 at 12:22
  • Sorry I meant " I can't get access to validation" instead of I can. Yes probably I will open a new post for it. – W. Sam Jun 27 '18 at 17:59
  • Keras automatically computes all metrics with the validation data if provided at the end of the epoch – Mark Rofail Jun 27 '18 at 18:07
  • There is problem with AUC metric with Keras. I will explain in the post. By the way, I just saw in your code you put target_tensors=[dataset_labels] in model.compile. I think you don't need that if you use the generator. Is it written by mistake? – W. Sam Jun 27 '18 at 20:26
  • 1
    The @Dat-Nguyen's solution has been changed into passing the iterator directly into model.fit instead of fit_generator. it should be supported with tensor flow 1.9, but it wasn't worked with me, give an error of "AttributeError: 'Iterator' object has no attribute 'ndim' ". – W. Sam Aug 12 '18 at 21:27
2

The way to connect a reinitializable iterator to a Keras model is to plug in an Iterator that returns both the x and y values concurrently:

sess = tf.Session()
keras.backend.set_session(sess) 

x = np.random.random((5, 2))
y = np.array([0, 1] * 3 + [1, 0] * 2).reshape(5, 2) # One hot encoded
input_dataset = tf.data.Dataset.from_tensor_slices((x, y))

# Create your reinitializable_iterator and initializer
reinitializable_iterator = tf.data.Iterator.from_structure(input_dataset.output_types, input_dataset.output_shapes)
init_op = reinitializable_iterator.make_initializer(input_dataset)

#run the initializer
sess.run(init_op) # feed_dict if you're using placeholders as input

# build keras model and plug in the iterator
model = keras.Model.model(...)
model.compile(...)
model.fit(reinitializable_iterator,...)

If you also have a validation dataset, the easiest thing to do is to just create a separate iterator and plug it in the validation_data parameter. Make sure to define your steps_per_epoch and validation_steps since they cannot be inferred.

Razorocean
  • 378
  • 2
  • 13