I'm using a TensorFlow Keras (python) model to process images. As mentioned in almost all my previous posts, TensorFlow gives me allocation errors at almost any resolution, and it's really problematic, since I need to procecss 12k (downscalling is not an option)
After debugging a lot, I discovered a breakthrough:
sess = tf.compat.v1.Session(config=
tf.compat.v1.ConfigProto(inter_op_parallelism_threads=1,
intra_op_parallelism_threads=1))
This allows me to process 12k images, on a server with a 100GB. The output:
2022-06-03 15:15:17.850681: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-06-03 15:15:17.850706: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-03 15:15:18.647082: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-03 15:15:18.858984: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.859782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.860572: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860618: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860647: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860676: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860704: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860730: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860757: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860785: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 15:15:18.860793: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-06-03 15:15:18.867533: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.868952: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 15:15:18.870245: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Loading model...
ai
1/1 [==============================] - 2s 2s/step
model succesfully loaded
preprocessing...
ai
2022-06-03 15:15:22.894033: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
2022-06-03 15:15:30.849366: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 14495514624 exceeds 10% of free system memory.
2022-06-03 15:15:55.283068: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 19327352832 exceeds 10% of free system memory.
2022-06-03 15:16:39.818722: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
2022-06-03 15:17:13.767888: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 28991029248 exceeds 10% of free system memory.
1/1 [==============================] - 255s 255s/step
max: 83386.50390625 MiB
As you can see it's still complaining a lot, but it's finishing the task, using 83GB, as expected (My mathematical model predicted 81.5GB)
But when not limiting threads, the output is all the errors, but without the results:
2022-06-03 13:09:49.329621: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-06-03 13:09:49.329644: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-03 13:09:50.178906: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 13:09:50.179749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-03 13:09:50.180558: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180586: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180609: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180631: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180653: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180675: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180698: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180720: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/snls/color/lib/python3.8/site-packages/cv2/../../lib64:
2022-06-03 13:09:50.180728: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-06-03 13:09:50.190450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading model...
ai
1/1 [==============================] - 0s 290ms/step
model succesfully loaded
preprocessing...
ai
2022-06-03 13:09:52.739451: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
2022-06-03 13:09:53.269927: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 14495514624 exceeds 10% of free system memory.
2022-06-03 13:09:54.813454: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 19327352832 exceeds 10% of free system memory.
2022-06-03 13:09:58.060999: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 9663676416 exceeds 10% of free system memory.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Is there any way I can fix this? Because using only 1 thread is painfully slow, and I'm 99.9% surdent it won't work with hardware acceleration neither ;[
Finally here's the image programs code:
"""
Image processor using Neural network model
Developed by Sebastian Lindau-Skands
Model created by Soumik Rakshit
Last Modified 2022/05/21
"""
import tensorflow as tf
from glob import glob
import os
import numpy as np
from PIL import Image
import cv2
from variables import *
from memory_profiler import memory_usage
sess = tf.compat.v1.Session(config=
tf.compat.v1.ConfigProto(inter_op_parallelism_threads=1,
intra_op_parallelism_threads=1))
inputdir = "/input"
outputdir = "/output"
def GPU():
gpus = tf.config.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
global dce_model
dce_model = tf.keras.models.load_model('./AI/models/5.training')
loader()
def loader():
#Loading [1000x500] dummy picture, to preload model with correct parameters
print('Loading model...\n')
DCE(Image.open('./AI/models/loader.jpg'))
print ('\nmodel succesfully loaded\n')
main()
def main():
for image in (glob(inputdir + "/*.jpg") + glob(inputdir + "/*.jpeg") + glob(inputdir + "/*.png")):
#Saving the output of the AI-model, with the preprocessed image as the input
(DCE(preprocess(image))).save(outputdir + image.replace(inputdir, ''))
#image size 12288x6144
def preprocess(original_image):
img = cv2.imread(original_image)
print('preprocessing...')
#Gamma control
invGamma = 1/gamma
gamma_table = [((i / 255) ** invGamma) * 255 for i in range(256)]
gamma_array = np.array(gamma_table, np.uint8)
gamma_img = cv2.LUT(img, gamma_array)
#Applying CLAHE and contrast regulation
hsv_img = cv2.cvtColor(gamma_img, cv2.COLOR_BGR2HSV)
h, s, v = cv2.split(hsv_img)
clahe = cv2.createCLAHE(clipLimit=0.8, tileGridSize=(4,3))
clahe_v = clahe.apply(v)
enhanced_s = cv2.convertScaleAbs(s, alpha=alpha, beta=beta)
merged_hsv = cv2.merge((h, enhanced_s, clahe_v))
output_pre_process = cv2.cvtColor(merged_hsv, cv2.COLOR_HSV2RGB)
#converting cv2 to PIL, and returning output to AI model
pre_processed = Image.fromarray(output_pre_process)
return pre_processed
def DCE(pre_processed):
array = tf.keras.preprocessing.image.img_to_array(pre_processed)
floatarray = array.astype("float32") / 255.0
expanded = np.expand_dims(floatarray, axis=0)
#AI Model enhancement
print('ai')
enhanced = dce_model.predict(expanded)
output = tf.cast((enhanced[0, :, :, :] * 255), dtype=np.uint8)
output_image = Image.fromarray(output.numpy())
return output_image
if __name__ == '__main__':
mem = max(memory_usage(proc=GPU))
print("max: {} MiB".format(mem))
And the models code:
def build_dce_net():
input_img = keras.Input(shape=[None, None, 3])
conv1 = layers.Conv2D(
32, (3, 3), strides=(1, 1), activation="relu", padding="same"
)(input_img)
conv2 = layers.Conv2D(
32, (3, 3), strides=(1, 1), activation="relu", padding="same"
)(conv1)
conv3 = layers.Conv2D(
32, (3, 3), strides=(1, 1), activation="relu", padding="same"
)(conv2)
conv4 = layers.Conv2D(
32, (3, 3), strides=(1, 1), activation="relu", padding="same"
)(conv3)
int_con1 = layers.Concatenate(axis=-1)([conv4, conv3])
conv5 = layers.Conv2D(
32, (3, 3), strides=(1, 1), activation="relu", padding="same"
)(int_con1)
int_con2 = layers.Concatenate(axis=-1)([conv5, conv2])
conv6 = layers.Conv2D(
32, (3, 3), strides=(1, 1), activation="relu", padding="same"
)(int_con2)
int_con3 = layers.Concatenate(axis=-1)([conv6, conv1])
x_r = layers.Conv2D(24, (3, 3), strides=(1, 1), activation="tanh", padding="same")(
int_con3
)
#return keras.models.load_model('./high-res-trained')
return keras.Model(inputs=input_img, outputs=x_r)
One last thing: I know the code isn't the prettiest right now, but it's actively in development, so can't be bothered cleaning it up, before it's actually done xD