3

I'm trying to handle a sample matrix with a weight of more than 25GB. The problem I got a GPU with only 12GB memory. I thought that tenorflow will take small batches of the matrix from the ram to gRam like mentioned here: https://stackoverflow.com/a/53938359/16563202

instead, Keras just tries to copy the entire matrix to the graphical memory and fails. What do I do wrong?

samples=np.load("/sda/anybody/imagenet-in-np/extracted-wavelet-of-all-imagenet.npy").T

feature is also a numpy array.

model.fit(samples,feature,
          batch_size=4000,epochs=150,
          #callbacks=[tensorboard_callback]
          )

I get the following error:

tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

which means that TF couldn't copy some data to the GPU. How to fix it?

material bug
  • 147
  • 7
  • Depends on how large a single data point is, but maybe a batch size of 4000 is too large? Do you also have that problem if you use 1 instead of 4000? – cherrywoods Jan 19 '22 at 21:55
  • 1
    I've tried using smaller batch. It seems it tries to load the entire matrix into the GPU no matter what the batch size is. @cherrywoods – material bug Jan 19 '22 at 22:23

0 Answers0