1

I'm working with PIL in python to load and resize a large number of images, to feed to a CNN. But during the process of loading this error happens:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-1-9e7a5298cd3e> in <module>
      3 dog_names = ip.labels("dogImages/train")
      4 
----> 5 trn_data, trn_targets = ip.data_loader('dogImages/train', (224, 224))
      6 val_data, val_targets = ip.data_loader('dogImages/valid', (224, 224))
      7 tst_data, tst_targets = ip.data_loader('dogImages/test', (224, 224))

...my address...\libs\img_preprocessing.py in data_loader(path, size)
     48             cat_target.append([1 if pre_label(im)==label else 0 for label in labels(total)])
     49             img = Image.open(im)
---> 50             img = Image.Image.resize(img, size=size)
     51             img = np.array(img)
     52             arr.append(img)

C:\ProgramData\Anaconda3\lib\site-packages\PIL\Image.py in resize(self, size, resample, box, reducing_gap)
   1922             return im.convert(self.mode)
   1923 
-> 1924         self.load()
   1925 
   1926         if reducing_gap is not None and resample != NEAREST:

C:\ProgramData\Anaconda3\lib\site-packages\PIL\ImageFile.py in load(self)
    247                                     break
    248                                 else:
--> 249                                     raise OSError(
    250                                         "image file is truncated "
    251                                         f"({len(b)} bytes not processed)"

OSError: image file is truncated (150 bytes not processed)

I've seen some suggestions about adding this code:

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = False

But I think it allows defective data to enter our model. I don't want that. I want to skip corrupted images without crashing the program, and load all the rest of the images, but i can't figure it out. The code I use is this:

def data_loader(path, size):
    '''
    loading image data
    parameters:
        path => image directory path
        size => output size in tuple
    '''
    total = glob(path + "/*")
    arr = []
    for _dir in total:
        for im in glob(_dir+"/*"):
            img = Image.open(im)
            img = Image.Image.resize(img, size=size)
            img = np.array(img)
            arr.append(img)
    return np.array(arr)
CrazyChucky
  • 3,263
  • 4
  • 11
  • 25
  • 1
    If you are actually getting an error, can't you protect your load/resizing pipeline usin a try/except? You probably should return a list of the files that were loaded. – nonDucor Feb 16 '22 at 12:57
  • @nonDucor yes i was thinking about that. now I know the error is actually comes from resize method. – Mocking Bird Feb 16 '22 at 13:18

1 Answers1

1

Since the error is occurring when you attempt to resize, enclose that line in a try/except. When you get the error, continue skips the rest of the current iteration and continues on with the next image file.

from glob import glob

import numpy as np
from PIL import Image

def load_data(path, size):
    '''
    loading image data
    parameters:
        path => image directory path
        size => output size in tuple
    '''
    total = glob(path + "/*")
    images = []
    for subdir in total:
        for im in glob(subddir + "/*"):
            img = Image.open(im)
            try:
                img = img.resize(size)
            except OSError:
                continue
            img = np.array(img)
            images.append(img)
    return np.array(images)

Some other minor things I changed:

  • data_loader sounds more like a class than a function. I recommend verbs for functions, or at least not nouns that sound like they perform actions.
  • As a variable name, arr is both generic (what's in it?) and misleading (it's a list, not an array).
  • Variables starting in _ are, by convention, usually used for "private" attributes.
  • img.resize(size) is just a simpler way of calling the resize method.
CrazyChucky
  • 3,263
  • 4
  • 11
  • 25