I am deploying JupyterHub 0.8.2 to kubernetes (EKS on AWS, v1.13).
When I deploy the JupyterHub application to EKS via helm, everything deploys and starts fine. However, when I spawn a notebook server and create a python notebook, the kernel hangs when trying to connect. (See screenshots at the bottom)
I saw a similar issue posted here: https://github.com/jupyter/notebook/issues/2664, it seems there was a regression in tornado python package. However, I tried downgrading to 5.1.1 and that did not fix the issue...
What are the next troubleshooting steps I can try? Where can I find diagnostic info / logs for python kernel?
Update: one of our existing clusters that was running fine for about 2 months, started experiencing this kernel issue just today. This makes me wonder if this is some sort of regression, however how would this affect a jupyterhub deployment that has not been modified? Does jupyterhub update libraries/packages by itself, without consent?
Update 2: I inspected network traffic in browser, and discovered that the request to https://<<JUPYTERHUB_DOMAIN>>/user/me/api/kernels/<<KERNEL_ID>>/channels?session_id=<<SESSION_ID>>
is returning HTTP 504 GATEWAY_TIMEOUT
Detailed HTTP request:
GET wss://<<MY_JHUB_DOMAIN>>/user/me/api/kernels/eaf397d3-36da-473c-8342-c4d4d3ad5256/channels?session_id=fa79dc80238648b8b1ea4c3982cb0612 HTTP/1.1
Host: <<MY_JHUB_DOMAIN>>
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
Origin: https://<<MY_JHUB_DOMAIN>>
Sec-WebSocket-Version: 13
User-Agent: redacted
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cookie: redacted
Sec-WebSocket-Key:redacted
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Detailed HTTP response:
HTTP/1.1 504 GATEWAY_TIMEOUT
Content-Length: 0
Connection: keep-alive
data:undefined,