1

I have a "bird" folder which has 11345 bird images named as 1.jpg, 2.jpg, 3.jpg......11344.jpg, 11345.jpg. I need to save these birds images as "filenames.pickle" to use it in furthur machine learning modules. Data should be arranged in this manner: dataset/train/filenames.pickle, dataset/test/filenames.pickle

I need to create one single pickle file filenames.pickle to get all 11345 bird images. I am very much confuse how should I add this images into pickle so that my code take pickle file but it reached to these images at the end to train the Machine learning Models.

from PIL import Image  
import pickle

'''
I am just trying to convert one image into pickle to get an idea. 
if is succefully convert into pickle then I will read all the 
images inside the "bird" folder and convert all of them into one 
single pickle file
'''

# converting an image into pickle 
img = Image.open('059.jpg')
with open('059.pickle', 'wb') as f:
   pickle.dump(img, f)


## read the pickle file
with open('059.pickle','rb') as f:
file = pickle.load(f)
   print(file)

# after reading 059.pickle file : 
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x375 at 0x2115BE59190>

# I dont want ( <PIL.JpegImagePlugin.JpegImageFile image mode=RGB 
 size=500x375 at 0x2115BE59190>) this result into pickle file. 
# I want pickle file to save result like this: ['59.jpg']. 

 
 ## to convert whole images inside bird folder
 ## input folder = bird\\images\\all_images_in_jpg_format
  
 image = "bird\\images\\"
 fout = open("bird\\filenames.pickle",'wb')
 pickle.dump(image,fout)
 fout.close()

with open("bird\\filenames.pickle",'rb') as f:
file = pickle.load(f)
   print(file)
# output : bird\images\
## the above output is wrong


 '''
 becasue when I am done reading all the images and create one 
  pickle file as "filenames.pickle:, it should save images like 
  this: 
 ['01.jpg','0342.jpg','06762.jpg', '06752.jpg', '05122.jpg', 
  '05144.jpg', '06635.jpg','06638.jpg', 
 '05632.jpg',......'11345.jpg'] 
 and after reading this pickle file, somehow model will 
 automatcally read the images via pickle file.

 '''

I am not much familiar with pickle file and it's format. Can anyone please help me or give me some suggestions How should I tackle this problem and solve it? And how will the model read the images via pickle file? What does pickle file conains (image data and pixel information or just name of image files) so that Model can take pickle file and learn the images in training time?

Priyanka_U
  • 13
  • 1
  • 5
  • you should understand that the file name isn't part of the file itself as you think. your first test, for only one image, results in the expected way. I didn't understand if you want the data of the images, RGB data, or only the names of files. – Avizipi Jan 16 '22 at 16:01
  • @AssafLivne I want data of the image, RGB data. To be honest the project which I am doing I dont have any reference what to keep inside "filenames.pickle" but data should be arranged in this manner: dataset/train/filenames.pickle, dataset/test/filenames.pickle. It is my assumtion that it should be image data inside the pickle file, then only model will take pickle file as input and learn all the images data via pickle file. if I have only name of the files inside pikcle file, then how would model learn the images? – Priyanka_U Jan 16 '22 at 16:46
  • @AssafLivne so if you have any idea How Model gets train via taking pickle file (not images directly) then it would really help me alot. The expected output which I have written ['01.jpg','0342.jpg','06762.jpg'], I found this data from another pickle file which is present in the evalaution part of the project which gave me a hint that probably pickle file is taking image data not image file name, but I am not sure about it. – Priyanka_U Jan 16 '22 at 16:46

1 Answers1

2

Modification of my original answer. Now I pickle file names in one file, and pickle images into another file.

from PIL import Image
import os
import pickle
from glob import glob

## to convert whole images inside bird folder
## input folder = bird\\images\\all_images_in_jpg_format

PICKLE_FILE = "bird\\filenames.pickle"
SOURCE_DIRECTORY = "bird\\images\\"
PICKLE_IMAGES = "bird\\images.pickle"

path_list = glob(os.path.join(SOURCE_DIRECTORY, "*.jpg"))

# pickle images into big pickle file

with open(PICKLE_IMAGES,"wb") as f:
    for file_name in path_list:
        pickle.dump(Image.open(file_name),f)
        
# get short names from the path list 

file_list = list(
    map(
        lambda x: os.path.basename(x), path_list)
)

# pickle short name list

pickle.dump(file_list, open(PICKLE_FILE, 'wb'))

# test that we can reread the list

recovered_list = pickle.load(open(PICKLE_FILE,"rb"))

if file_list == recovered_list:
    print("Lists Match!")
else:
    print("Lists Don't Match!!!")


# read a couple images out of the image file:

display_count = 5


with open(PICKLE_IMAGES,"rb") as f:
    while True:
        try:
            pickle.load(f).show()
            display_count -= 1
            if display_count <= 0:
                break
        except EOFerror as e:
            break
        

It may still be your trainer wants individual pickled images, or it doesn't like the image format used by PIL.

RufusVS
  • 4,008
  • 3
  • 29
  • 40
  • Thank you. I should check the model on both: pickle file containg file names of the jpg files and pickle file containg image data. And see which pickle file is helping the Model to train by giving image information. Can you also help me, how to save image data inside pickle file? – Priyanka_U Jan 16 '22 at 18:39
  • It looks like the first part of your original program is successfully pickling and unpickling your image. All you need to do is pickle a list of images instead of a single image. Are you sure the Model trainer is expecting pickled data? That is a python specific format, as far as I know. – RufusVS Jan 16 '22 at 19:28
  • Yes I think so, becasue It is a bit confusing for me to understand that how will a Model get trained from pickle file which has name of jpg file instead of image data. Ususally models takes Images as input and learn different features from image (so from name of jpg file, what will model learn). – Priyanka_U Jan 16 '22 at 20:10
  • What is the modeling software you are using? Is it proprietary? If not, perhaps other eyes on the modeling software documentation would help find your answer. – RufusVS Jan 17 '22 at 13:38
  • It is a Machine Learning project with Pytroch librabry and code is runnine on server. – Priyanka_U Jan 17 '22 at 19:24