9

I'm trying to allocate a really big dataset (~28GB of RAM in an ndarray) into theano shared variables, using borrow=True to avoid replicating the memory. In order to do so, I'm using the following function:

def load_dataset(path):
    # Load dataset from memory
    data_f = np.load(path+'train_f.npy')
    data_t = np.load(path+'train_t.npy')

    # Split into training and validation
    return (
        (
            theano.shared(data_f[:-1000, :], borrow=True),
            theano.shared(data_t[:-1000, :], borrow=True)
        ), (
            theano.shared(data_f[-1000:, :], borrow=True),
            theano.shared(data_t[-1000:, :], borrow=True)
        )
    )

In order to avoid data conversions, prior to saving the arrays to disk I already defined them to be in the correct format (afterwards filling them and dumping them into disk with np.save()):

data_f = np.ndarray((len(rows), 250*250*3), dtype=theano.config.floatX)
data_t = np.ndarray((len(rows), 1), dtype=theano.config.floatX)

It seems, though, that theano tires to replicate the memory anyway, dumping me the following error:

Error allocating 25594500000 bytes of device memory (out of memory). Driver report 3775729664 bytes free and 4294639616 bytes total.

Theano is configured to work on the GPU (GTX 970).

kvorobiev
  • 5,012
  • 4
  • 29
  • 35
gaspercat
  • 425
  • 5
  • 13

1 Answers1

10

Instead of using theano.shared, it is possible to use theano.tensor._shared to force the data to be allocated into CPU memory. The fixed code ends up like this:

def load_dataset(path):
    # Load dataset from memory
    data_f = np.load(path+'train_f.npy')
    data_t = np.load(path+'train_t.npy')

    # Split into training and validation
    return (
        (
            theano.tensor._shared(data_f[:-1000, :], borrow=True),
            theano.tensor._shared(data_t[:-1000, :], borrow=True)
        ), (
            theano.tensor._shared(data_f[-1000:, :], borrow=True),
            theano.tensor._shared(data_t[-1000:, :], borrow=True)
        )
    )
user1251007
  • 15,891
  • 14
  • 50
  • 76
gaspercat
  • 425
  • 5
  • 13