What is the right way to manage memory in Theano for training sets that cannot fit in RAM?

Question

TL;DR: How do I give more data to a Theano function without taking more memory?

The problem I'm having is that training my ML algorithm on the GPU with Theano causes the GPU to eventually run out of memory. I took a slight departure from the tutorial because my dataset is too big to read entirely into memory (this must be an issue for video algorithms too, right?), so rather than using an index input and update scheme, I just pass the Theano function the ndarrays directly.

Let me give an example of what I mean. In the Logistic Regression tutorial in Theano it says to do something along the lines of:

train_model = theano.function(
    inputs=[index],
    outputs=cost,
    updates=updates,
    givens={
        x: train_set_x[index * batch_size: (index + 1) * batch_size],
        y: train_set_y[index * batch_size: (index + 1) * batch_size]
    }
)

This requires test_set_x and test_set_y to be loaded into memory, and the tutorial uses a SharedVariable to store the complete dataset.

Ok for me, the dataset is huge (many many gigabytes), which means it cannot all be loaded into memory at once, so I modified mine to take the data directly, thusly:

train_model = theano.function(
    inputs=[input, classes], 
    outputs=cost, 
    updates=updates
)

and then I do something that looks vaguely like this:

for count, data in enumerate(extractor):
    observations, labels = data
    batch_cost = train_model(observations, labels)
    logger.debug("Generation %d: %f cost", count, batch_cost)

I think I may be fundamentally misunderstanding how to properly hand data to the GPU without some nasty python garbage-collection dirtiness. It seems like this is just occupying more and more memory in the model internally, because after training this after a (large) number of batches, I get an error like this:

Error when tring to find the memory information on the GPU: initialization error
Error freeing device pointer 0x500c88000 (initialization error). Driver report 0 bytes free and 0 bytes total 
CudaNdarray_uninit: error freeing self->devdata. (self=0x10cbbd170, self->devata=0x500c88000)
Exception MemoryError: 'error freeing device pointer 0x500c88000 (initialization error)' in 'garbage collection' ignored
Fatal Python error: unexpected exception during garbage collection

How do I give more data to a Theano function, without taking up more memory?

score 0 · Answer 1 · answered Nov 16 '16 at 00:15

0

If the dataset does not fit in memory, the idea is to take a portion of it and load it each time you need.

If your data does not fit in the gpu memory, as seen in the classic lasagne tutorial, you can iterate over portion of the dataset, called minibatches

Then, if your data does not fit in your RAM, you need to load the minibatch each time you need it. Best way to do that is to make a separate process load the next minibatch (cpu working) as you are analysing the current one (gpu working)

You can inspire yourself from AlexNet :

answered Nov 16 '16 at 00:15

mxdbld

16,747
5
34
37

Right, the question is not "how do minibatches work conceptually" but rather "how do minibatches work in code with Theano". You can't see it explicitly in code but `for count, data in enumerate(extractor):` is doing exactly that. The problem is that they way I've configured things, it seems each invocation of `train_model` allocates more GPU memory. How do I get it to recycle memory? – Stefan Sullivan Nov 18 '16 at 02:16
checking things out, do you have a allow_gc=False in your .theanorc ? – mxdbld Nov 18 '16 at 11:42
Do you create a shared variable in your extractor. If yes, don't, as you only pass one minibatch each time, or call a_shared_varialbe.set_value() if you reallywant to do that. That's as far is I can go here. – mxdbld Nov 18 '16 at 12:14

What is the right way to manage memory in Theano for training sets that cannot fit in RAM?

1 Answers1