0

I am trying to read multiple images using python 3 on google colab and local host. Unfortunately, the memory crashes at 25 images and exceed 12 GB RAM. Here is a snippet of my code after several trials.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.image as img
from scipy.cluster.vq import kmeans, vq
from google.colab import drive
from PIL import Image


import cv2
import glob
import numpy as np
r = []
g = []
b = []
df=[]
files = glob.glob ("drive/MyDrive/rgb1/*.pg.jpg")
for file in files:
  im = Image.open(file)
  pix=list(im.getdata())
  for pixel in pix:
      r.append(pixel[0])
      g.append(pixel[1])
      b.append(pixel[2])
  d = pd.DataFrame({'red':r, 'green':g, 'blue':b})
  df.append(d)
  im.close()
  • Can you provide some details about the images, like size and color depth? – Thomas Sablik Mar 01 '21 at 18:07
  • 1
    Using standard Python lists of standard Python integers to store the R, G and B values is really inefficient, memory-wise. Numpy arrays can store that data much more efficiently; such as as uint8 values, in which case there'd be almost no memory overhead at all. Each value is only going to be 8 bits, so there's no need to store each one as an entire Python integer. – Random Davis Mar 01 '21 at 18:11
  • What are you planning to do with the data next? – Mark Setchell Mar 01 '21 at 18:33
  • try resetting the lists (empty them r = [] g = [] b = []) after each loop. You can reinitialize them with r = [] or using list.clear(): r.clear(), https://stackoverflow.com/questions/850795/different-ways-of-clearing-lists – pippo1980 Mar 01 '21 at 19:01

0 Answers0