I have been trying to train a regression model, with big data on AWS Sagemaker.
The instance I used on my last try was ml.m5.12xlarge and I was confident it will work this time, but no. I still get the error.
After some minutes in the training I get this error on Cloudwatch:
[E 07:00:35.308 NotebookApp] KernelRestarter: restart callback <bound method ZMQChannelsHandler.on_kernel_restarted of ZMQChannelsHandler(f92aff37-be6b-48df-a5f5-522bcc6dd072)> failed
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/jupyter_client/restarter.py", line 86, in _fire_callbacks
callback()
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/notebook/services/kernels/handlers.py", line 473, in on_kernel_restarted
self._send_status_message('restarting')
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/notebook/services/kernels/handlers.py", line 469, in _send_status_message
self.write_message(json.dumps(msg, default=date_default))
File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tornado/websocket.py", line 337, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
Does anyone might know what the error could be?