Allocation issues when tensorflow isn't limited to a single thread

Question

I'm using a TensorFlow Keras (python) model to process images. As mentioned in almost all my previous posts, TensorFlow gives me allocation errors at almost any resolution, and it's really problematic, since I need to procecss 12k (downscalling is not an option)

After debugging a lot, I discovered a breakthrough:

sess = tf.compat.v1.Session(config=
    tf.compat.v1.ConfigProto(inter_op_parallelism_threads=1,
        intra_op_parallelism_threads=1))

This allows me to process 12k images, on a server with a 100GB. The output:

2022-06-03 15:15:17.850681: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-06-03 15:15:17.850706: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-03 15:15:18.647082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-03 15:15:18.858984: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.859782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.860572: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860618: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860647: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860676: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860704: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860730: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860757: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860785: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860793: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-06-03 15:15:18.867533: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.868952: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.870245: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Loading model...

ai
1/1 [==============================] - 2s 2s/step

model succesfully loaded

preprocessing...
ai
2022-06-03 15:15:22.894033: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
2022-06-03 15:15:30.849366: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 14495514624 exceeds 10% of free system memory.
2022-06-03 15:15:55.283068: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 19327352832 exceeds 10% of free system memory.
2022-06-03 15:16:39.818722: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
2022-06-03 15:17:13.767888: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 28991029248 exceeds 10% of free system memory.
1/1 [==============================] - 255s 255s/step
max: 83386.50390625 MiB

As you can see it's still complaining a lot, but it's finishing the task, using 83GB, as expected (My mathematical model predicted 81.5GB)

But when not limiting threads, the output is all the errors, but without the results:

2022-06-03 13:09:49.329621: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-06-03 13:09:49.329644: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-03 13:09:50.178906: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 13:09:50.179749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 13:09:50.180558: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180586: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180609: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180631: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180653: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180675: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180698: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180720: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180728: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-06-03 13:09:50.190450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading model...

ai
1/1 [==============================] - 0s 290ms/step

model succesfully loaded

preprocessing...
ai
2022-06-03 13:09:52.739451: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
2022-06-03 13:09:53.269927: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 14495514624 exceeds 10% of free system memory.
2022-06-03 13:09:54.813454: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 19327352832 exceeds 10% of free system memory.
2022-06-03 13:09:58.060999: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Is there any way I can fix this? Because using only 1 thread is painfully slow, and I'm 99.9% surdent it won't work with hardware acceleration neither ;[

Finally here's the image programs code:

"""
Image processor using Neural network model
Developed by Sebastian Lindau-Skands
Model created by Soumik Rakshit
Last Modified 2022/05/21
"""
import tensorflow as tf
from glob import glob
import os
import numpy as np
from PIL import Image
import cv2
from variables import *
from memory_profiler import memory_usage

sess = tf.compat.v1.Session(config=
        tf.compat.v1.ConfigProto(inter_op_parallelism_threads=1,
            intra_op_parallelism_threads=1))

inputdir = "/input"
outputdir = "/output"

def GPU():
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        try:
            # Currently, memory growth needs to be the same across GPUs
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            print(e)
    global dce_model
    dce_model = tf.keras.models.load_model('./AI/models/5.training')
    loader()

def loader():
    #Loading [1000x500] dummy picture, to preload model with correct parameters
    print('Loading model...\n')
    DCE(Image.open('./AI/models/loader.jpg'))
    print ('\nmodel succesfully loaded\n')
    main()

def main():
    for image in (glob(inputdir + "/*.jpg") + glob(inputdir + "/*.jpeg") + glob(inputdir + "/*.png")):
        #Saving the output of the AI-model, with the preprocessed image as the input
        (DCE(preprocess(image))).save(outputdir + image.replace(inputdir, ''))
#image size 12288x6144
def preprocess(original_image):
    img = cv2.imread(original_image)
    print('preprocessing...')
    #Gamma control
    invGamma = 1/gamma
    gamma_table = [((i / 255) ** invGamma) * 255 for i in range(256)]
    gamma_array = np.array(gamma_table, np.uint8)
    gamma_img = cv2.LUT(img, gamma_array)
    #Applying CLAHE and contrast regulation
    hsv_img = cv2.cvtColor(gamma_img, cv2.COLOR_BGR2HSV)
    h, s, v = cv2.split(hsv_img)
    clahe = cv2.createCLAHE(clipLimit=0.8, tileGridSize=(4,3))
    clahe_v = clahe.apply(v)
    enhanced_s = cv2.convertScaleAbs(s, alpha=alpha, beta=beta)
    merged_hsv = cv2.merge((h, enhanced_s, clahe_v))
    output_pre_process = cv2.cvtColor(merged_hsv, cv2.COLOR_HSV2RGB)
    #converting cv2 to PIL, and returning output to AI model
    pre_processed = Image.fromarray(output_pre_process)
    return pre_processed

def DCE(pre_processed):
    array = tf.keras.preprocessing.image.img_to_array(pre_processed)
    floatarray = array.astype("float32") / 255.0
    expanded = np.expand_dims(floatarray, axis=0)
    #AI Model enhancement
    print('ai')
    enhanced = dce_model.predict(expanded)
    output = tf.cast((enhanced[0, :, :, :] * 255), dtype=np.uint8)    
    output_image = Image.fromarray(output.numpy())
    return output_image

if __name__ == '__main__':
    mem = max(memory_usage(proc=GPU))
    print("max: {} MiB".format(mem))

And the models code:

def build_dce_net():
    input_img = keras.Input(shape=[None, None, 3])
    conv1 = layers.Conv2D(
        32, (3, 3), strides=(1, 1), activation="relu", padding="same"
    )(input_img)
    conv2 = layers.Conv2D(
        32, (3, 3), strides=(1, 1), activation="relu", padding="same"
    )(conv1)
    conv3 = layers.Conv2D(
        32, (3, 3), strides=(1, 1), activation="relu", padding="same"
    )(conv2)
    conv4 = layers.Conv2D(
        32, (3, 3), strides=(1, 1), activation="relu", padding="same"
    )(conv3)
    int_con1 = layers.Concatenate(axis=-1)([conv4, conv3])
    conv5 = layers.Conv2D(
        32, (3, 3), strides=(1, 1), activation="relu", padding="same"
    )(int_con1)
    int_con2 = layers.Concatenate(axis=-1)([conv5, conv2])
    conv6 = layers.Conv2D(
        32, (3, 3), strides=(1, 1), activation="relu", padding="same"
    )(int_con2)
    int_con3 = layers.Concatenate(axis=-1)([conv6, conv1])
    x_r = layers.Conv2D(24, (3, 3), strides=(1, 1), activation="tanh", padding="same")(
        int_con3
    )
    #return keras.models.load_model('./high-res-trained')
    return keras.Model(inputs=input_img, outputs=x_r)

One last thing: I know the code isn't the prettiest right now, but it's actively in development, so can't be bothered cleaning it up, before it's actually done xD

These are warnings, not errors, see the big W in each of them? TensorFlow always warns if a single allocation are more than 10% of system RAM, you could disable all warnings (search this site for this). — Dr. Snoopy, Jun 03 '22 at 20:02
Only the std::bad_alloc is an error that you should fix, have you considered modifying code so it uses less memory? For example 2000x1000 sized images is pretty large and seems you cannot use this size with this model and amount of RAM you have, you could downscale the images to something more sensitive. — Dr. Snoopy, Jun 03 '22 at 20:04
But I'm using 12288x6144 when getting the error. But when succeeding, with a mem consumption of 80 GB(out of 126) I'm also using 12288x6144. Plus this doesn't explain why the threading fixed it. — Lynet _101, Jun 04 '22 at 04:16

Allocation issues when tensorflow isn't limited to a single thread

0 Answers0