I need to train a neural network model (4 GRU layers) implemented in TensorFlow. The code I got from another developer was originally a Jupyter notebook.
If I run the notebook the model trains fine, the RAM usage is around 20% and the GPU usage is about 11GB.
If I copy the code into a python file and run it, it keeps crashing. This happens also if I reduce the batch size. In particular, the RAM usage is much higher and the GPU memory used much less (about 2.5 GB).
The error messages I get are like these:
2401/2402 [============================>.] - ETA: 0s - loss: 0.0866 - accuracy: 0.9831 - precision: 0.0000e+00 - recall: 0.0000e+002021-08-12 08:29:31.894194: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2053413600 exceeds 10% of free system memory.
[...]
Filling up shuffle buffer (this may take a while): 15082 of 19299
Killed
Do you have any suggestions on how to fix this? I would prefer to train the model using the python file (the time per epoch is much smaller).