InternalError: Failed copying input tensor from CPU:0 to GPU:0 in order to run _EagerConst: Dst tensor is not initialized

Question

I am running a code for Tensorflow cross validation training with 10 folds. The code works in a for loop where I have to run the model.fit each time of the loop. When I run it for the first fold it works well and then GPU memory becomes full. Here is my for loop:

acc_per_fold = []
loss_per_fold = []
for train, test in kfold.split(x_train, y_train):
    fold_no = 1
    # Define the model architecture
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3,3), input_shape = x_train[0].shape, activation = "relu"))
    model.add(MaxPooling2D(2,2))
    model.add(Conv2D(32, kernel_size=(3,3), activation = "relu"))
    model.add(MaxPooling2D(2,2))

    model.add(Flatten())
    model.add(Dense(64, activation = "relu"))
    model.add(Dropout(0.1))
    model.add(Dense(32, activation = "tanh"))
    model.add(Dense(1, activation = "sigmoid"))

    # Compile the model
    model.compile(loss = "binary_crossentropy", 
              optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001), 
              metrics = ["accuracy"])


    # Generate a print
    print('------------------------------------------------------------------------')
    print(f'Training for fold {fold_no} ...')
    # Fit data to model
    history = model.fit(np.array(x_train)[train], np.array(y_train)[train],
              batch_size=32,
              epochs=10,
              verbose=1)

    # Generate generalization metrics
    scores = model.evaluate(np.array(x_train)[test], np.array(y_train)[test], verbose=0)
    print(f"Score for fold {fold_no}: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%")
    acc_per_fold.append(scores[1] * 100)
    loss_per_fold.append(scores[0])

    # Increase fold number
    fold_no += fold_no

Also, I searched and found using numba library is an option to release the GPU memory, it worked but the kernel in Jupyter notebook died and I had to reset so this solution will not work in my case.

Hi @Neuro_Coder, please try decreasing the batch_size and try again. Also kindly refer to the comments [here](https://stackoverflow.com/a/69748608/14290697) and [here](https://stackoverflow.com/a/71768484/14290697). Thank you! — , Oct 06 '22 at 05:55

score 3 · Accepted Answer · answered Nov 12 '22 at 20:51

3

I faced this problem long time ago, Even after reducing the batch size didn’t work. My GPU was rtx 3060 12 GB RAM and it worked on Google Collab Pro However, there is one solution for this problem that may work. You can use the gc library which cleans the GPU after each iteration

import gc

You can put this statement in the loop

gc.collect()

and hopefully it will work by cleaning the RAM after each loop

answered Nov 12 '22 at 20:51

Omar

297
5
16

I decided to use collab since i posted this question but I tried this on my code and it worked, thanks – Nov 12 '22 at 21:02
1

Really awful issue given that i assumed the issue was with GPU memory per iteration wrt the batch size or data size, rather than python failing to automatically collect. Spent forever trying to get it to work, thank you! – eshanrh Jul 27 '23 at 04:09
@eshanrh yes some issues are so hectic like that! no problem good luck – Omar Jul 28 '23 at 07:34

InternalError: Failed copying input tensor from CPU:0 to GPU:0 in order to run _EagerConst: Dst tensor is not initialized

1 Answers1