I have training set consisting of ~2k 300x400 pxs greyscale images. Whole collection has size ~20 Mb. I'm trying to classify these images with pybrain neural net. The problem is when I'm loading the dataset SupervisedDataSet
my small python script consumes about 8 Gb memory which is actually too much.
So I have the questions: how can I learn this dataset with 10 gigs ram laptop? Is there way to load parts of the dataset "on demand" while learning? Is there way to split the dataset into smaller parts and feed it to the net one by one? I couldn't find the answers in pybrain documentation.
Here is how I build the dataset:
# returns ([image bytes], category) where category = 1 for apple, category = 0 for banana
def load_images(dir):
data = []
for d, n, files in os.walk(dir):
for f in files:
category = int(f.startswith('apple_'))
im = Image.open('{}/{}'.format(d, f))
data.append((bytearray(im.tobytes()), category))
return data
def load_data_set(dir):
print 'loading images'
data = load_images(dir)
print 'creating dataset'
ds = SupervisedDataSet(120000, 1) #120000 bytes each image
for d in data:
ds.addSample(d[0], (d[1],))
return ds
Thank you for any kind of help.