0

So I'm currently working with GPT2 running on Tensorflow for text generation. I'm working with this repo specifically. I recently decided to install CUDA and cudnn to improve GPU capability and installed it via these instructions. I'm currently using Windows 10 x64 with NVIDIA Geforce GTX 1650 for my GPU and I'm using the command prompt terminal. I followed the instructions as best I could: downloaded the right GPU driver, set environment variables, copied cudnn files where they should go, etc. When I finished installing, I tried to generate an unconditional sample with the model I trained and this happened:

Microsoft Windows [Version 10.0.19043.1288]
(c) Microsoft Corporation. All rights reserved.

C:\Users\"username">cd C:\Users\"username"\Desktop\gpt-2-finetuning\src

C:\Users\"username"\Desktop\gpt-2-finetuning\src> python generate_unconditional_samples.py --model_name novel
2021-10-17 00:18:21.694165: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-17 00:18:22.435510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2153 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5
WARNING:tensorflow:From C:\Users\"username"\Desktop\gpt-2-finetuning\src\sample.py:60: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\util\dispatch.py:206: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
2021-10-17 00:18:45.451534: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 196.32MiB (rounded to 205852672)requested by op sample_sequence/while/body/_1/model/MatMul/ReadVariableOp
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-10-17 00:18:45.467103: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2021-10-17 00:18:45.474451: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256):  Total Chunks: 15, Chunks in use: 15. 3.8KiB allocated for chunks. 3.8KiB in use in bin. 60B client-requested in use in bin.
2021-10-17 00:18:45.481771: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512):  Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.489403: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024):         Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-10-17 00:18:45.498581: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.509522: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096):         Total Chunks: 148, Chunks in use: 148. 592.0KiB allocated for chunks. 592.0KiB in use in bin. 592.0KiB client-requested in use in bin.
2021-10-17 00:18:45.517609: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192):         Total Chunks: 25, Chunks in use: 25. 300.0KiB allocated for chunks. 300.0KiB in use in bin. 300.0KiB client-requested in use in bin.
2021-10-17 00:18:45.526116: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384):        Total Chunks: 24, Chunks in use: 24. 384.0KiB allocated for chunks. 384.0KiB in use in bin. 384.0KiB client-requested in use in bin.
2021-10-17 00:18:45.536214: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.548694: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.563635: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072):       Total Chunks: 4, Chunks in use: 4. 786.0KiB allocated for chunks. 786.0KiB in use in bin. 785.3KiB client-requested in use in bin.
2021-10-17 00:18:45.578935: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.594547: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.601621: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.608788: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.619285: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304):      Total Chunks: 25, Chunks in use: 25. 100.00MiB allocated for chunks. 100.00MiB in use in bin. 100.00MiB client-requested in use in bin.
2021-10-17 00:18:45.628480: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608):      Total Chunks: 24, Chunks in use: 24. 288.00MiB allocated for chunks. 288.00MiB in use in bin. 288.00MiB client-requested in use in bin.
2021-10-17 00:18:45.637872: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216):     Total Chunks: 48, Chunks in use: 48. 768.00MiB allocated for chunks. 768.00MiB in use in bin. 768.00MiB client-requested in use in bin.
2021-10-17 00:18:45.651217: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.663622: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.677210: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728):    Total Chunks: 5, Chunks in use: 5. 995.43MiB allocated for chunks. 995.43MiB in use in bin. 981.58MiB client-requested in use in bin.
2021-10-17 00:18:45.686363: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:45.701152: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 196.32MiB was 128.00MiB, Chunk State:
2021-10-17 00:18:45.710829: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 2258055936
2021-10-17 00:18:45.715322: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600000 of size 1280 next 1
2021-10-17 00:18:45.727700: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600500 of size 12582912 next 2
2021-10-17 00:18:45.735730: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b200500 of size 12288 next 3
2021-10-17 00:18:45.745330: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b203500 of size 16384 next 4
2021-10-17 00:18:45.757304: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b207500 of size 4096 next 5
2021-10-17 00:18:45.777662: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0b208500 of size 16777216 next 6

...goes on for a while like this

2021-10-17 00:18:49.046582: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a3e00 of size 12288 next 318
2021-10-17 00:18:49.056312: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a6e00 of size 205852672 next 313
2021-10-17 00:18:49.063244: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b778f7e00 of size 205852672 next 319
2021-10-17 00:18:49.069964: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b83d48e00 of size 220374272 next 18446744073709551615
2021-10-17 00:18:49.076724: I tensorflow/core/common_runtime/bfc_allocator.cc:1065]      Summary of in-use Chunks by size:
2021-10-17 00:18:49.085663: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 15 Chunks of size 256 totalling 3.8KiB
2021-10-17 00:18:49.092613: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2021-10-17 00:18:49.101615: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 148 Chunks of size 4096 totalling 592.0KiB
2021-10-17 00:18:49.109453: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 12288 totalling 300.0KiB
2021-10-17 00:18:49.118227: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 16384 totalling 384.0KiB
2021-10-17 00:18:49.125224: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 201216 totalling 786.0KiB
2021-10-17 00:18:49.134291: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 4194304 totalling 100.00MiB
2021-10-17 00:18:49.142594: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 12582912 totalling 288.00MiB
2021-10-17 00:18:49.150332: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 48 Chunks of size 16777216 totalling 768.00MiB
2021-10-17 00:18:49.159611: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 205852672 totalling 785.27MiB
2021-10-17 00:18:49.166664: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 220374272 totalling 210.17MiB
2021-10-17 00:18:49.175719: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 2.10GiB
2021-10-17 00:18:49.179917: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 2258055936 memory_limit_: 2258055988 available bytes: 52 curr_region_allocation_bytes_: 4516112384
2021-10-17 00:18:49.186738: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit:                      2258055988
InUse:                      2258055424
MaxInUse:                   2258055424
NumAllocs:                         326
MaxAllocSize:                220374272
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-10-17 00:18:49.214161: W tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************************************************************************************
2021-10-17 00:18:49.224793: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at resource_variable_ops.cc:158 : Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2021-10-17 00:18:49.234240: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.0KiB (rounded to 4096)requested by op sample_sequence/model/h0/attn/split
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-10-17 00:18:49.253961: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2021-10-17 00:18:49.260477: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256):  Total Chunks: 15, Chunks in use: 15. 3.8KiB allocated for chunks. 3.8KiB in use in bin. 60B client-requested in use in bin.
2021-10-17 00:18:49.267677: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512):  Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.274584: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024):         Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-10-17 00:18:49.282179: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048):         Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.291707: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096):         Total Chunks: 148, Chunks in use: 148. 592.0KiB allocated for chunks. 592.0KiB in use in bin. 592.0KiB client-requested in use in bin.
2021-10-17 00:18:49.299699: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192):         Total Chunks: 25, Chunks in use: 25. 300.0KiB allocated for chunks. 300.0KiB in use in bin. 300.0KiB client-requested in use in bin.
2021-10-17 00:18:49.309406: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384):        Total Chunks: 24, Chunks in use: 24. 384.0KiB allocated for chunks. 384.0KiB in use in bin. 384.0KiB client-requested in use in bin.
2021-10-17 00:18:49.316823: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.323705: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.330699: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072):       Total Chunks: 4, Chunks in use: 4. 786.0KiB allocated for chunks. 786.0KiB in use in bin. 785.3KiB client-requested in use in bin.
2021-10-17 00:18:49.341079: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.347442: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.355050: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.362441: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.373022: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304):      Total Chunks: 25, Chunks in use: 25. 100.00MiB allocated for chunks. 100.00MiB in use in bin. 100.00MiB client-requested in use in bin.
2021-10-17 00:18:49.379516: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608):      Total Chunks: 24, Chunks in use: 24. 288.00MiB allocated for chunks. 288.00MiB in use in bin. 288.00MiB client-requested in use in bin.
2021-10-17 00:18:49.386849: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216):     Total Chunks: 48, Chunks in use: 48. 768.00MiB allocated for chunks. 768.00MiB in use in bin. 768.00MiB client-requested in use in bin.
2021-10-17 00:18:49.394833: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.406519: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.413489: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728):    Total Chunks: 5, Chunks in use: 5. 995.43MiB allocated for chunks. 995.43MiB in use in bin. 981.58MiB client-requested in use in bin.
2021-10-17 00:18:49.423166: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-17 00:18:49.433375: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 4.0KiB was 4.0KiB, Chunk State:
2021-10-17 00:18:49.439983: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 2258055936
2021-10-17 00:18:49.446385: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600000 of size 1280 next 1
2021-10-17 00:18:49.453157: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b0a600500 of size 12582912 next 2

...etc, etc...

2021-10-17 00:18:52.034032: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a3e00 of size 12288 next 318
2021-10-17 00:18:52.041039: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b6b4a6e00 of size 205852672 next 313
2021-10-17 00:18:52.050136: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b778f7e00 of size 205852672 next 319
2021-10-17 00:18:52.057217: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at b83d48e00 of size 220374272 next 18446744073709551615
2021-10-17 00:18:52.066414: I tensorflow/core/common_runtime/bfc_allocator.cc:1065]      Summary of in-use Chunks by size:
2021-10-17 00:18:52.074512: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 15 Chunks of size 256 totalling 3.8KiB
2021-10-17 00:18:52.083562: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2021-10-17 00:18:52.091067: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 148 Chunks of size 4096 totalling 592.0KiB
2021-10-17 00:18:52.097600: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 12288 totalling 300.0KiB
2021-10-17 00:18:52.105189: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 16384 totalling 384.0KiB
2021-10-17 00:18:52.114193: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 201216 totalling 786.0KiB
2021-10-17 00:18:52.121798: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 25 Chunks of size 4194304 totalling 100.00MiB
2021-10-17 00:18:52.131072: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 24 Chunks of size 12582912 totalling 288.00MiB
2021-10-17 00:18:52.138520: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 48 Chunks of size 16777216 totalling 768.00MiB
2021-10-17 00:18:52.145005: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 4 Chunks of size 205852672 totalling 785.27MiB
2021-10-17 00:18:52.151508: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 220374272 totalling 210.17MiB
2021-10-17 00:18:52.160622: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 2.10GiB
2021-10-17 00:18:52.165037: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 2258055936 memory_limit_: 2258055988 available bytes: 52 curr_region_allocation_bytes_: 4516112384
2021-10-17 00:18:52.174756: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit:                      2258055988
InUse:                      2258055424
MaxInUse:                   2258055424
NumAllocs:                         326
MaxAllocSize:                220374272
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-10-17 00:18:52.197768: W tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************************************************************************************
2021-10-17 00:18:52.207819: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at split_op.cc:308 : Resource exhausted: OOM when allocating tensor with shape[1,1,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
    return fn(*args)
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

         [[strided_slice/_645]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\"username"\Desktop\gpt-2-finetuning\src\generate_unconditional_samples.py", line 79, in <module>
    fire.Fire(sample_model)
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "C:\Users\"username"\Desktop\gpt-2-finetuning\src\generate_unconditional_samples.py", line 71, in sample_model
    out = sess.run(output)
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "C:\Users\"username"\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

         [[strided_slice/_645]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[50257,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/MatMul/ReadVariableOp}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Wasn't sure why this was happening and figured that I installed the cudnn files incorrectly. Messed around for a bit and found out that when I removed cudnn64_8.dll from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin where I was told to copy it and then ran an unconditional sample, GPT2 worked just fine and was able to generate some text. All the other cudnn files were still in their CUDA directories. Not sure why the inclusion of cudnn64_8.dll would screw things up. Did I install the wrong version of CUDA? What exactly is going on here?

EDIT:

So I decided to add TF_GPU_ALLOCATOR=cuda_malloc_async to environment variables as the terminal suggested above. This time I didn't get an OOM error like last time, but it also terminated the program. Here's the result:

Microsoft Windows [Version 10.0.19043.1288]
(c) Microsoft Corporation. All rights reserved.

C:\Users\"username">cd C:\Users\"username"\Desktop\gpt-2-finetuning\src

C:\Users\"username"\Desktop\gpt-2-finetuning\src>python generate_unconditional_samples.py --model_name novel
2021-10-17 15:20:12.172740: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-17 15:20:12.681534: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:215] Using CUDA malloc Async allocator for GPU: 0

C:\Users\"username"\Desktop\gpt-2-finetuning\src>

What exactly am I doing wrong here? Why is my GPU running out of memory?

Alditrus
  • 87
  • 1
  • 5
  • "OOM" = out of memory. You are running out of memory – talonmies Oct 17 '21 at 07:15
  • Ok, sooo... does that mean I have insufficient GPU on my computer? If that's the case, why does the GPT2 work just fine when the cudnn64_8.dll is removed? – Alditrus Oct 17 '21 at 15:19

0 Answers0