ResourceExhaustedError: OOM when allocating tensor with shape[256,100000000]

Question

I am using TensorFlow's Variable for my classification problem. The number of output classes are 1e8.

n_inputs = 5000
n_classes = 1e8
features = tf.placeholder(tf.float32, [None, n_inputs])
labels = tf.placeholder(tf.float32, [None, n_classes])

h_layer = 256

weights = {
'hidden_weights' : tf.Variable(tf.random_normal([n_inputs, h_layer])),
'out_weights' : tf.Variable(tf.random_normal([h_layer, int(n_classes)]))
}

bias = {
'hidden_bias' : tf.Variable(tf.random_normal([h_layer])),
'out_bias' : tf.Variable(tf.random_normal([int(n_classes)]))
}

While running this code, I am getting ResourceExhaustedError for allocating 'out_weights' with (256,100000000). Is there anyway I can overcome this issue?

FYI : I am running this code in CPU.

Please find the stack trace below:

---------------------------------------------------------------------------
ResourceExhaustedError                    Traceback (most recent call last)
C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1021     try:
-> 1022       return fn(*args)
   1023     except errors.OpError as e:

C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1003                                  feed_dict, fetch_list, target_list,
-> 1004                                  status, run_metadata)
   1005 

C:\Anaconda\envs\tensorflow\lib\contextlib.py in __exit__(self, type, value, traceback)
     65             try:
---> 66                 next(self.gen)
     67             except StopIteration:

C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py in raise_exception_on_not_ok_status()
    465           compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466           pywrap_tensorflow.TF_GetCode(status))
    467   finally:

ResourceExhaustedError: OOM when allocating tensor with shape[256,100000000]
     [[Node: random_normal_5/RandomStandardNormal = RandomStandardNormal[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/cpu:0"](random_normal_5/shape)]]

During handling of the above exception, another exception occurred:

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-26-d5491564869f> in <module>()
     39 init = tf.global_variables_initializer()
     40 with tf.Session() as sess:
---> 41     sess.run(init)
     42     total_batches = batches(batchSize, train_features, train_labels)
     43 

C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    765     try:
    766       result = self._run(None, fetches, feed_dict, options_ptr,
--> 767                          run_metadata_ptr)
    768       if run_metadata:
    769         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    963     if final_fetches or final_targets:
    964       results = self._do_run(handle, final_targets, final_fetches,
--> 965                              feed_dict_string, options, run_metadata)
    966     else:
    967       results = []

C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1013     if handle is None:
   1014       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1015                            target_list, options, run_metadata)
   1016     else:
   1017       return self._do_call(_prun_fn, self._session, handle, feed_dict,

C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1033         except KeyError:
   1034           pass
-> 1035       raise type(e)(node_def, op, message)
   1036 
   1037   def _extend_graph(self):

ResourceExhaustedError: OOM when allocating tensor with shape[256,100000000]
     [[Node: random_normal_5/RandomStandardNormal = RandomStandardNormal[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/cpu:0"](random_normal_5/shape)]]

Caused by op 'random_normal_5/RandomStandardNormal', defined at:
  File "C:\Anaconda\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Anaconda\envs\tensorflow\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\ipykernel\__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\ipykernel\kernelapp.py", line 474, in start
    ioloop.IOLoop.instance().start()
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tornado\ioloop.py", line 887, in start
    handler_func(fd_obj, events)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tornado\stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\ipykernel\kernelbase.py", line 390, in execute_request
    user_expressions, allow_stdin)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\ipykernel\zmqshell.py", line 501, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2717, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2821, in run_ast_nodes
    if self.run_code(code, result):
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-f183ffda50a1>", line 10, in <module>
    'out_weights' : tf.Variable(tf.random_normal([h_layer, int(n_classes)]))
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\random_ops.py", line 77, in random_normal
    seed2=seed2)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_random_ops.py", line 189, in _random_standard_normal
    name=name)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[256,100000000]
     [[Node: random_normal_5/RandomStandardNormal = RandomStandardNormal[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/cpu:0"](random_normal_5/shape)]]

score 1 · Accepted Answer · answered Jul 02 '17 at 18:52

1

The short answer is no. If you want to have a fully connected layer between 256 and 1e8 neurons you end up with 256 * 1e8 numbers in memory, there is nothing you can do. This seems rather as a wrong model then a wrong code, why would you have 1e8 output classes? Even with very strong correlations between them you would probably need at least 1e10 (ten bilions samples) points to train it in the first place. You should reconsider how to approach the task at hand, I can't really believe you really need 1e8 independent outputs.

answered Jul 02 '17 at 18:52

lejlot

64,777
8
131
164

After changing that number of classes also, I am getting the same error with 1e8. Is there any reason ? n_classes = 1161. After this also, I am getting 1e8 in the error – Kathiravan Natarajan Jul 02 '17 at 19:54
The only reason is a mistake somewhere in your code. The one you provided would run with n_classes=1161 – lejlot Jul 02 '17 at 20:44

ResourceExhaustedError: OOM when allocating tensor with shape[256,100000000]

1 Answers1