0

I'm trying to convert a preprocessed dataset stored in the form of pickle objects back into images. There are a total of 820 images of either 227227 or 299299 resolution. I've attached the code below.

The problem is that initially tqdm shows that ~ multiple hundred files are getting converted each second but it almost slows down exponentially, by the 500th file it's down to 1 file a second. I'm not sure what's causing this and have come across suggestions to use concurrency to solve this. I've tried saving the plot using savefig of Matplotlib but run into same slowdown.

I'm wondering what part of the code is causing the slowdown and how to fix it as I've got to convert multiple 100's of pickle files back into images.

EDIT : The problem was the program running out of memory and slowing down.

import _pickle
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
import png

for filename in tqdm(os.listdir(os.getcwd())):
    if "pickle" in filename:
        try:
            with open(filename, 'rb') as inputfile:
                im = _pickle.load(inputfile, encoding='latin1')
                size = im.shape[0]
                ims = Image.fromarray(im.reshape([size,size]))
                #plt.imshow(im.reshape([size,size]), cmap = 'gray')
                #plt.savefig(filename + '.png')
                #ims = ims.resize((size, size), Image.ANTIALIAS)  # LANCZOS as of Pillow 2.7
                ims.save(filename +'.jpeg', quality = 95)
        except _pickle.UnpicklingError:
                pass
Dnana_Dev
  • 1
  • 1
  • 1
  • I would like to help, but unfortunately you have removed the `import` statements so I can't tell what module `_pickle` comes from, nor have you given any indication of the dimensions (in pixels) of your images, nor have you given any indication of how anyone might create a sample pickled file that matches your pickled files to test with... – Mark Setchell Jan 09 '20 at 10:03
  • I've updated the question to include the dataset details, import statements. I managed to convert the files by breaking down the dataset into smaller batches. It's not an elegant solution and I would still like to know how to improve the process. – Dnana_Dev Jan 10 '20 at 10:33
  • You don't appear to show how I might create a pickled file that matches yours... – Mark Setchell Jan 10 '20 at 10:35
  • The pickled files come from a preprocessed version of the InBreast dataset containing images of mammograms, I'm unaware of the process used to create them. Link to source of dataset : https://github.com/wentaozhu/deep-mil-for-whole-mammogram-classification – Dnana_Dev Jan 10 '20 at 10:38
  • I created 10,000 pickled images of 299x299 on my Mac and ran your code. It worked just fine and produced 10,000 JPEGs in 33 seconds without batting an eyelid. The only thing I can suggest is to watch the memory your Python process uses and see if it rises over time - that is a common cause for programs slowing down. – Mark Setchell Jan 10 '20 at 10:54
  • Okay, sure thing. I'll run the code on another machine to figure out the bottleneck. Really appreciate the effort. – Dnana_Dev Jan 10 '20 at 11:33

0 Answers0