ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[28800,19200]

Question

I posted a question about Auto Encoder (AutoEncoder).

I installed the following program, but now, when I input an image of 160 horizontal pixels by 120 pixels, "ResourceExhaustedError" occurs and I can not proceed with learning. Specifically, Error occurs at line 130. On the other hand, if you set the resolution to half the width 80 vertical 60 pixels, it seems that the EPOC advances and learning progresses. (It divides the image by the program 2 and makes it small.)

I think that the image size (width 160 x 120 pixels) and the number of sheets (about 700 sheets) are not particularly large, but why can not you teach why the error occurs and how to solve it? Considering the possibility that main memory insufficiency may be affected, I made 128 GB of memory, but the same error occurs.

Please help me. Thank you.

The environment is described below.

CPU: Xeon E5-1620v4　4core/8t

Motherboard: ASUS X99-E WS

Memory: DDR4-2400 64 GB (8G × 8)

GPU: NVIDIA Quadro GP100 × 2　16GB

OS: ubuntu 16.04 LTS

Here is the source code

import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops
import cv2
import os

DATASET_PATH = "/home/densos/workspaces/autoencoder"
DIR_PATH = "input_gray_160*120"
IMAGE_PATH = os.path.join(DATASET_PATH, DIR_PATH)
X_PIXEL, Y_PIXEL = 160, 120
M = 1
N_HIDDENS = np.array(np.array([1.5]) * X_PIXEL * Y_PIXEL // (M*M), dtype = np.int)
TRANCE_FRAME_NUM = 700

ops.reset_default_graph()

def xavier_init(fan_in, fan_out, constant = 1):
    low = -constant * np.sqrt(6.0 / (fan_in + fan_out))
    high = constant * np.sqrt(6.0 / (fan_in + fan_out))
    return tf.random_uniform((fan_in, fan_out), minval = low, maxval = high, dtype = tf.float32)

class AdditiveGaussianNoiseAutoencoder(object):
    def __init__(self, n_input, n_hidden, transfer_function = tf.nn.sigmoid, optimizer = tf.train.AdamOptimizer(), scale = 0.1):
        self.n_input = n_input
        self.n_hidden = n_hidden
        self.transfer = transfer_function
        self.scale = tf.placeholder(tf.float32)
        self.training_scale = scale
        network_weights = self._initialize_weights()
        self.weights = network_weights
        self.sparsity_level = np.repeat([0.05], self.n_hidden).astype(np.float32)
        self.sparse_reg = 0.1

        # model
        self.x = tf.placeholder(tf.float32, [None, self.n_input])
        self.hidden = self.transfer(tf.add(tf.matmul(self.x + scale * tf.random_normal((n_input,)),
                self.weights['w1']),
                self.weights['b1']))
        self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2'])

        # cost
        self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0)) + self.sparse_reg \
                        * self.kl_divergence(self.sparsity_level, self.hidden)

        self.optimizer = optimizer.minimize(self.cost)

        init = tf.global_variables_initializer()
        self.sess = tf.Session()
        self.sess.run(init)

    def _initialize_weights(self):
        all_weights = dict()
        all_weights['w1'] = tf.Variable(xavier_init(self.n_input, self.n_hidden))
        all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype = tf.float32))
        all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype = tf.float32))
        all_weights['b2'] = tf.Variable(tf.zeros([self.n_input], dtype = tf.float32))
        return all_weights

    def partial_fit(self, X):
        cost, opt = self.sess.run((self.cost, self.optimizer), feed_dict = {self.x: X,
                                                                            self.scale: self.training_scale
                                                                            })
        return cost

    def kl_divergence(self, p, p_hat):
        return tf.reduce_mean(p * tf.log(p) - p * tf.log(p_hat) + (1 - p) * tf.log(1 - p) - (1 - p) * tf.log(1 - p_hat))

    def calc_total_cost(self, X):
        return self.sess.run(self.cost, feed_dict = {self.x: X,
                                                     self.scale: self.training_scale
                                                     })

    def transform(self, X):
        return self.sess.run(self.hidden, feed_dict = {self.x: X,
                                                       self.scale: self.training_scale
                                                       })

    def generate(self, hidden = None):
        if hidden is None:
            hidden = np.random.normal(size = self.weights["b1"])
        return self.sess.run(self.reconstruction, feed_dict = {self.hidden: hidden})

    def reconstruct(self, X):
        return self.sess.run(self.reconstruction, feed_dict = {self.x: X,
                                                               self.scale: self.training_scale
                                                               })

    def getWeights(self):
        return self.sess.run(self.weights['w1'])

    def getBiases(self):
        return self.sess.run(self.weights['b1'])

def get_random_block_from_data(data, batch_size):
    start_index = np.random.randint(0, len(data) - batch_size)
    return data[start_index:(start_index + batch_size)]


if __name__ == '__main__':
#get input data lists
    lists = []
    for file in os.listdir(IMAGE_PATH):
        if file.endswith(".jpeg"):
            lists.append(file)
        lists.sort()

#read input data    
    input_images = []
    for image in lists:
        tmp = cv2.imread(os.path.join(IMAGE_PATH, image), cv2.IMREAD_GRAYSCALE)
        tmp = cv2.resize(tmp, (X_PIXEL // M, Y_PIXEL // M))
        tmp = tmp.reshape(tmp.shape[0] * tmp.shape[1])
        input_images.append(tmp)

#preprocess images    
    input_images = np.array(input_images) / 255.

#convert data to float16
    input_images = np.array(input_images, dtype = np.float16)

#set train and test data
    X_train = input_images[:500]
    X_test = input_images[500:]

    n_samples = X_train.shape[0]
    training_epochs = 200
    batch_size = X_train.shape[0] // 4
    display_step = 10

    autoencoder = AdditiveGaussianNoiseAutoencoder(n_input = X_train.shape[1],
                                                   n_hidden = N_HIDDENS[0],
                                                   transfer_function = tf.nn.relu6,
                                                   optimizer = tf.train.AdamOptimizer(learning_rate = 0.001),
                                                   scale = 0.01)

    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(n_samples / batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs = get_random_block_from_data(X_train, batch_size)

            # Fit training using batch data
            cost = autoencoder.partial_fit(X_train)
            # Compute average loss
            avg_cost += cost / n_samples * batch_size

        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch + 1), "cost=", avg_cost)

    print("Finish Train")

predicted_imgs = autoencoder.reconstruct(X_test)
predicted_imgs = np.array((predicted_imgs) * 255, dtype = np.uint8)
input_imgs = np.array((X_test) * 255, dtype = np.uint8)

# plot the reconstructed images
for i in range(100):
    im1 = predicted_imgs[i].reshape((Y_PIXEL//M, X_PIXEL//M))
    im2 = input_imgs[i].reshape((Y_PIXEL//M, X_PIXEL//M))

    img_v_union = cv2.vconcat([im1, im2])
    cv2.moveWindow('result.jpg', 100, 200)
    cv2.imshow('result.jpg', img_v_union)

    cv2.waitKey(33)

OOM Error occurs when the GPU couldn't allocate enough memory for the computational matrix, i would suggest reduce the batch size value and give a shot — Surya Tej, Jun 05 '18 at 08:10

score 0 · Answer 1 · answered Jun 05 '18 at 08:39

Your ResourceExhaustedError is not caused by exceeding your main memory resources. It's caused because you're attempting to allocate more than the 16GB of memory available in a single GPU. Note that N_HIDDENS is 28800 and n_input is X_PIXEL * Y_PIXEL which is 19200. In __init__, these huge numbers get passed to _initialize_weights() as n_hidden and n_input, respectively. These values are then used to initialize weight variables in the line all_weights['w1'] = tf.Variable(xavier_init(self.n_input, self.n_hidden)). The creates a massive fully-connected layer, which will almost certainly exceed your GPU memory size. Run the code below to estimate the size of that matrix. It may crash with a MemoryError if your system doesn't have suffient main memory to store the resultant matrix.

import numpy as np

# Here's a stand in vector - I'm only using it to compute batch_size.
input_images = np.random.rand(1000)
X_train = input_images[:500]
X_test = input_images[500:]
n_samples = X_train.shape[0]
training_epochs = 200
batch_size = X_train.shape[0] // 4
print(batch_size)

# Now, let's compute the number of hidden units
X_PIXEL, Y_PIXEL = 160, 120
M = 1
N_HIDDENS = np.array(np.array([1.5]) * X_PIXEL * Y_PIXEL // (M*M), dtype = np.int)

print(N_HIDDENS[0])

# Now we compute the number of input units.
input_vector_size = X_PIXEL * Y_PIXEL
print(input_vector_size)

# Finally, we make an approximate replica of your first weight matrix.
# Note: THis is huge, and is why you're getting an out of memory error.
your_batch = np.zeros((N_HIDDENS[0], input_vector_size, batch_size), dtype=float)

# If this didn't exceed you main memory allocation, this will print it's size.
print(your_batch.nbytes/1000000000)

You can see that reducing your image size in width or height will quadratically reduce the memory footprint of the fully-connected layer weight matrix. That's why reducing the image height and width worked. Note that reducing your batch size probably won't help here! Doing so won't change the size of your fully connected layer. Thus, you should consider a convolutional, rather than fully-connected, approach.

Hope you find this explanation helpful.

Thaks you for your comment. If the program does not use GPU, can the program learn with CPU only?　How do I write it? Processing time will be longer, but can you learn using 64 GB of main memory?　Sorry for lots of questions. — oguririn, Jun 05 '18 at 10:42
It's ok to have questions! Yes. You can train using the CPU. It will be impractically slow, but if you're doing this as a proof-of-concept, that might be ok. To make that happen, you need to disable GPU allocation for the tensors you're creating. I suggest asking that as another, separate question here on StackOverflow. If this answers your question, accept the answer and ask a new one using the blue button at the top-left. — Justin Fletcher, Jun 05 '18 at 16:25

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[28800,19200]

1 Answers1