I was running a jupyter notebook on the google cloud platform for around 20 hours when I received the following error. More specifically, a notebook instance on the AI Platform. It happened within the second epoch at batch 52500 out of 292893.
The notebook was not able to reconnect, so I would need to run everything again. I initially thought that it used up all RAM or disk but doesn't seem like it:
I did not close the browser tab at any time. I did not shut down the notebook instance at any time. Using us-west1-b for the instance.
Any idea how to not make this happen again?