Why do some images save properly and others don't when using ImageDataGenerator and .flow()?

Question

I have been trying to augment some training data and the corresponding labels using ImageDataGenerator.

Here's the way I've approached it (apologies if the formatting is a bit off)

def create_morph():
   i = 0
   img_type = 'png'

#get the path to all the images to be morphed
   print ('getting morph path...') 
   imgs = glob(OG_PATH + "/*." + img_type)

#check how many images are in the morph path
  print('length of imgs')
  print(len(imgs))

#make two identical structured numpy arrays (num of images, rows, cols, binary). This is for loading into later
  rows = 208
  cols = 336
  imgdatas = np.ndarray((len(imgs),rows,cols,1), dtype=np.uint8)
  imglabels = np.ndarray((len(imgs),rows,cols,1), dtype=np.uint8)

#image-wise
  for imgname in imgs:
     print('inside for-loop')
     midname = imgname[imgname.rindex("/")+1:]
     img = load_img(OG_PATH + "/" + midname,grayscale = True)
     label = load_img(GT_PATH + "/" + midname,grayscale = True)

    #convert images to arrays
     img = img_to_array(img)
     label = img_to_array(label)

    #make a big npy array 

    imgdatas[i] = img
    imglabels[i] = label
    if i % 100 == 0:
        print('Done: {0}/{1} images'.format(i, len(imgs)))
        i += 1

#setup the morph paramaters         
  morphData = dict(
  horizontal_flip = True, 
  vertical_flip = True)

#assign the morphing to each label and og image
  morph_img = ImageDataGenerator(**morphData)
  morph_label = ImageDataGenerator(**morphData)

#apply morph to og images   

  print('saving to file')
  a = 0
  b = 0

  for batch in morph_img.flow(
     imgdatas,
     save_to_dir = MORPHED_PATH + '/augment_results_im/',
     batch_size = 1,
     save_prefix = 'batch', 
     save_format = 'png'):

     a+=1        
     if a > len(imgdatas):
        break 


print ('done with the OGs')

#apply morph to label images

  for batch in morph_label.flow(
     imglabels,
     save_to_dir = MORPHED_PATH + '/augment_results_labels/',
     batch_size = 1,
     save_prefix = 'batch', 
     save_format = 'png'):

     b+=1        
     if b > len(imgdatas):
        break 

print('done with labels')

This code works for me, in the way that I do get flipped images, but the problem I am having is that it will only flip the first two images but not the rest of the images in my imgdatas and imglabels arrays. The rest come out blank. See here for an example. I've looked into this post and this one about iterating over .flow(), but still not sure why only 2 of the images work when I iterate over .flow(). Any ideas?

Also I'm unsure about what the names of the images mean, it looks like it's a randomly generated number, but not sure where that's been defined.

Thanks for your help

I thought I figured it out by increasing the threshold of a and b, this worked a bit, I have more than two augmented images now, but the last half the images I get are still weird and pixelated. — B1ueMang0, Aug 27 '18 at 03:50

score 0 · Answer 1 · answered Sep 07 '18 at 01:16

So I've managed to get a solution. I had to turn each of my images into an array of size (1, rows, cols, channels) and then, per image in that array (which will always be 1), augment it, if that makes sense. At first I had a for loop to cycle through all the images in the directory and make a big array of size (total_images, rows, cols, channels) and then I augmented that array once it was done. For some reason that would not cycle through the entire array, it would just do the first few images. So I changed that for loop to this:

  #image-wise
  for imgname in range(1, len(imgs))
     imgdatas = np.ndarray((1,208,336,1), dtype=np.uint8) # size of array to always contain 1 image
     imglabels = np.ndarray((1,208,336,1), dtype=np.uint8)

     img = load_img(OG_PATH + '/(%d).png' %(imgname), grayscale = True) 
     label = load_img(GT_PATH + '(%d).png' %(imgname), grayscale = True)

     #convert images to arrays
     img = img_to_array(img)
     label = img_to_array(label)

     #append to one big array
     imgdatas[i] = img
     imglabels[i] = label

     #apply morph to og images   

     print('saving to file')

     seed = 1
     a = 0
     for batch in morph_img.flow(
        imgdatas,
        batch_size = 1,
        save_to_dir = 'morphed_og_path/',
        save_prefix = str(imgname), 
        save_format = 'png', 
        seed = seed): # I added the seed as well so my originals and labels were being augmented the same way 

        a+=1        
        if a > 20:
            break 


     print ('done with the OGs')

     #apply morph to label images
     b = 0
     for batch in morph_label.flow(
        imglabels,
        batch_size = 1,
        save_to_dir = 'morphed_labels_path/',
        save_prefix = str(imgname), 
        save_format = 'png', 
        seed = seed):

        b+=1        
        if b > 20:
            break 

     print('done with labels')

It works the way I want it to, but I know it's really inefficient and I'm sure there's a better way. So other answers are still welcome.

Why do some images save properly and others don't when using ImageDataGenerator and .flow()?

1 Answers1