4

For certain problems, the validation data can't be a generator, e.g.: TensorBoard histograms:

If printing histograms, validation_data must be provided, and cannot be a generator.

My current code looks like:

image_data_generator = ImageDataGenerator()

training_seq   = image_data_generator.flow_from_directory(training_dir)
validation_seq = image_data_generator.flow_from_directory(validation_dir)
testing_seq    = image_data_generator.flow_from_directory(testing_dir)

model = Sequential(..)
# ..
model.compile(..)
model.fit_generator(training_seq, validation_data=validation_seq, ..)

How do I provide it as validation_data=(x_test, y_test)?

A T
  • 13,008
  • 21
  • 97
  • 158

2 Answers2

5

Python 2.7 and Python 3.* solution:

from platform import python_version_tuple

if python_version_tuple()[0] == '3':
    xrange = range
    izip = zip
    imap = map
else:
    from itertools import izip, imap

import numpy as np

# ..
# other code as in question
# ..

x, y = izip(*(validation_seq[i] for i in xrange(len(validation_seq))))
x_val, y_val = np.vstack(x), np.vstack(y)

Or to support class_mode='binary', then:

from keras.utils import to_categorical

x_val = np.vstack(x)
y_val = np.vstack(imap(to_categorical, y))[:,0] if class_mode == 'binary' else y

Full runnable code: https://gist.github.com/AlecTaylor/7f6cc03ed6c3dd84548a039e2e0fd006

A T
  • 13,008
  • 21
  • 97
  • 158
  • I tried your code and I got x_val and y_val with a shape respectively of (83,224,224,3) and (83,3). I wanted to reshape y_val to (83,) however I got this error **ValueError: cannot reshape array of size 249 into shape (83,)** – root Aug 23 '22 at 10:57
4

Update (22/06/2018): Read the answer provided by the OP for a concise and efficient solution. Read mine to understand what's going on.


In python you can get all the generators data using:

data = [x for x in generator]

But, ImageDataGenerators does not terminate and therefor the approach above would not work. But we can use the same approach with some modifications to work in this case:

data = []     # store all the generated data batches
labels = []   # store all the generated label batches
max_iter = 100  # maximum number of iterations, in each iteration one batch is generated; the proper value depends on batch size and size of whole data
i = 0
for d, l in validation_generator:
    data.append(d)
    labels.append(l)
    i += 1
    if i == max_iter:
        break

Now we have two lists of tensor batches. We need to reshape them to make two tensors, one for data (i.e X) and one for labels (i.e. y):

data = np.array(data)
data = np.reshape(data, (data.shape[0]*data.shape[1],) + data.shape[2:])

labels = np.array(labels)
labels = np.reshape(labels, (labels.shape[0]*labels.shape[1],) + labels.shape[2:])
today
  • 32,602
  • 8
  • 95
  • 115