7

I am working on a image classification Kaggle competition and download some training images from Kaggle.com. Then I am using transfer learning with ResNet50 to work on these images, within Keras 2.0 and Tensorflow as background (and Python 3).

However, 258 out the total 1281 train images are having 'Possibly corrupt EXIF data' and been ignored when loaded to the ResNet model, very likely due to a Pillow issue.

The output messages are like:

/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 524288 bytes but only got 0. Skipping tag 3
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 393216 bytes but only got 0. Skipping tag 3
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 33554432 bytes but only got 0. Skipping tag 4
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 25165824 bytes but only got 0. Skipping tag 4
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 131072 bytes but only got 0. Skipping tag 3
  "Skipping tag %s" % (size, len(data), tag))
(more to come ...)

Based on the output messages, I only know they are there, but don't know which ones they are...

My question is: how can I identify these 258 images so that I can manually remove them out of the data set?

user3768495
  • 4,077
  • 7
  • 32
  • 58
  • 1
    Possible duplicate of [Getting error while running a classification code in keras](https://stackoverflow.com/questions/45452783/getting-error-while-running-a-classification-code-in-keras) – jdhao Oct 28 '17 at 13:38

3 Answers3

4

Edit: To raise Warnings as errors which you can catch, take a look at Justas comment below.


Even if this question is over a year old, i want to show my solution cause i was running into the same problem.

I was editing the error messages. The output shows where to find the file on your system and also the line number. For example i changed following:

if len(data) != size:
    warnings.warn("Possibly corrupt EXIF data.  "
                  "Expecting to read %d bytes but only got %d."
                  " Skipping tag %s" % (size, len(data), tag))
    continue

to

if len(data) != size:
    raise ValueError('Corrupt Exif data')
    warnings.warn("Possibly corrupt EXIF data.  "
                  "Expecting to read %d bytes but only got %d."
                  " Skipping tag %s" % (size, len(data), tag))
    continue

My code to catch the ValueError is shown below. The code gives you the advantage that PIL is interrupted and is not showing an useless message. Also you can catch this one and use it, e.g. to delete the corresponding file via the 'except' part.

import os
from PIL import Image

imageFolder = /Path/To/Image/Folder
listImages = os.listdir(imageFolder)

for img in listImages:
    imgPath = os.path.join(imageFolder,img)
            
    try:
        img = Image.open(imgPath)
        exif_data = img._getexif()
    except ValueError as err:
        print(err)
        print("Error on image: ", img)

I know adding the ValueError part is quick and dirty, but it's better than get confronted with all the useless warning messages.

Clown77
  • 41
  • 5
  • Doesnt work on PNG files, getting: `AttributeError: 'PngImageFile' object has no attribute '_getexif'` – Serġan Apr 02 '19 at 09:24
  • @Serġan that's because png has no such information. See [EXIF](https://en.wikipedia.org/wiki/Exchangeable_image_file_format). You should normally find this information just in the .JPG, .TIF, .WAV formats. – Clown77 Apr 02 '19 at 13:21
  • 1
    To catch UserWarning like an Exception you can use [this](http://stackoverflow.com/a/15934081/461597) instead. – Justas Jul 31 '20 at 06:31
  • @Justas Thank you, I was not aware of this possibility. – Clown77 Aug 05 '20 at 19:27
0

In case this helps anyone in the future, here's how I removed all EXIF data from my dataset, which removed the PIL warnings.

# remove corrupt exif data

from PIL import Image

file_names = get_image_files(path)

def remove_exif(image_name):
    image = Image.open(image_name)
    if not image.getexif():
        return
    print('removing EXIF from', image_name, '...')
    data = list(image.getdata())
    image_without_exif = Image.new(image.mode, image.size)
    image_without_exif.putdata(data)

    image_without_exif.save(image_name)

for file in file_names:
    remove_exif(file)
print('done')
Manuel Araoz
  • 15,962
  • 24
  • 71
  • 95
-1

The easiest way that comes to mind is to modify your code to handle one image at a time, then iterate over each images and check which one generates the warning.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753