I am following the Tensorflow Object Detection API tutorial to train a Faster R-CNN model on my own dataset on Google Cloud. But the following "ran out-of-memory" error kept happening.
The replica master 0 ran out-of-memory and exited with a non-zero status of 247.
And according to the logs, a non-zero exit status -9
was returned. As described in the official documentation, a code of -9
might means the training is using more memory than allocated.
However, the memory utilization is lower than 0.2. So why I am having the memory problem? If it helps, the memory utilization graph is here.