9

I'm using the ImageDataGenerator inside Keras to read a directory of images. I'd like to save the result inside a numpy array, so I can do further manipulations and save it to disk in one file.

flow_from_directory() returns an iterator, which is why I tried the following

itr = gen.flow_from_directory('data/train/', batch_size=1, target_size=(32,32))
imgs = np.concatenate([itr.next() for i in range(itr.nb_sample)])

but that produced

ValueError: could not broadcast input array from shape (32,32,3) into shape (1)

I think I'm misusing the concatenate() function, but I can't figure out where I fail.

pietz
  • 2,093
  • 1
  • 21
  • 23
  • i partly solved my problem by adding a `[0]` behind `itr.next()`. however this only gives me the x-data and i have to do the same again with `[1]` for the y-data. i then fail to merge the two given `(A,B,C,D)` and `(A,E)` to shape `(A,B,C,D,E)`. – pietz Feb 17 '17 at 15:52

2 Answers2

19

I had the same problem and solved it the following way: itr.next returns the next batch of images as two numpy.ndarray objects: batch_x, batch_y. (Source: keras/preprocessing/image.py) So what you can do is set the batch_size for flow_from_directory to the size of your whole train dataset.

Example, my whole training set consists of 1481 images:

train_datagen = ImageDataGenerator(rescale=1. / 255)
itr = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=1481,
class_mode='categorical')

X, y = itr.next()
MJimitater
  • 833
  • 3
  • 13
  • 26
Florian
  • 385
  • 6
  • 13
9

While using ImageDataGenerator, the data is loaded in the format of the directoryiterator. you can extract it as batches or as a whole

train_generator = train_datagen.flow_from_directory(
    train_parent_dir,
    target_size=(300, 300),
    batch_size=32,
    class_mode='categorical'
)

the output of which is

Found 3875 images belonging to 3 classes.

to extract as numpy array as a whole(which means not as a batch), this code can be used

x=np.concatenate([train_generator.next()[0] for i in range(train_generator.__len__())])
y=np.concatenate([train_generator.next()[1] for i in range(train_generator.__len__())])
print(x.shape)
print(y.shape)

NOTE:BEFORE THIS CODE IT IS ADVISED TO USE train_generator.reset()

the output of above code is

(3875, 300, 300, 3)
(3875, 3)

The output is obtained as a numpy array together, even though it was loaded as batches of 32 using ImageDataGenerator.

To get the output as batches use the following code

x=[]
y=[]
train_generator.reset()
for i in range(train_generator.__len__()):
   a,b=train_generator.next()
   x.append(a)
   y.append(b)
x=np.array(x)
y=np.array(y)
print(x.shape)
print(y.shape)

the output of the code is

(122,)
(122,)

Hope this works as a solution

John Paulson
  • 91
  • 1
  • 1