17

i am facing issue with my inception model during the performance testing with Apache JMeter.

Error: OOM when allocating tensor with shape[800,1280,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: Cast = CastDstT=DT_FLOAT, SrcT=DT_UINT8, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Mohd Daoud
  • 301
  • 1
  • 2
  • 5

2 Answers2

24

OOM stands for Out Of Memory. That means that your GPU has run out of space, presumably because you've allocated other tensors which are too large. You can fix this by making your model smaller or reducing your batch size. By the looks of it, you're feeding in a large image (800x1280) you may want to consider downsampling.

Cory Nezin
  • 1,551
  • 10
  • 22
  • 6
    I'm getting this error on the 4th epoch of training. Epochs 1-3 succeeded. Do you know why this would occur? The memory demands on the GPU during epoch 4 are the same as the memory demands during the first 3 epochs. Am I supposed to be releasing GPU memory after each batch is processed or something like that? – littleO Oct 27 '19 at 07:31
  • You might be creating more tensors in the training loop which would cause more memory to be consumed, you shouldn't be doing that. It's hard to say without the code though. – Cory Nezin Oct 27 '19 at 18:54
  • What is the best way to ensure that unused tensor are deleted or the best way to ensure that more tensors aren't been created in the training loop as you suggested @CoryNezin – KoKo Apr 17 '20 at 09:58
  • I am trapped in same error, what is answer to the above thought ? "What is the best way to ensure that unused tensor are deleted or the best way to ensure that more tensors aren't been created in the training loop as you suggested" @CoryNezin – Aqib Mumtaz Jan 14 '21 at 12:35
  • I haven't used TensorFlow in a while but I'm sure it's been updated with many capabilities since this question. You might want to ask a new question for that. – Cory Nezin Jan 14 '21 at 16:27
0

If you have multiple GPUS at hand, kindly select a GPU which is not as busy as this one, (possible reasons, other processes are also running on this GPU). Go to terminal and type

export CUDA_VISIBLE_DEVICES=1 

where 1 is the number of other GPU available. re-run the same code.

you can check the available GPUs using

nvidia-smi 

this will show you what GPUs are available and how much memory is available on each one of them

Priyank Pathak
  • 464
  • 4
  • 17