We are currently running Flask SocketIO in GCP Cloud Run, with message_queue utilizing GCP Redis (Memorystore) through a VPC network.
The GCP Redis instance is of Basic type, with a capacity of 12 GB.
Our VPC network comprises 2 min instances, 3 max instances, with 3 active instances, and other servers that use the same VPC network.
We are using Flask SocketIO version 5.3.2
We are using python package redis
version 4.4.0
Initially, everything was functioning properly, but we have encountered issues after a certain period of time. The first instance happened after approximately 10 hours, and the second instance occurred after about 3 days. (Note that the revisions we were using were different between instances)
We have observed the following:
- The first time, we added some log and re-deployed the Flask SocketIO server, and it started working properly again.
- The second time:
- We switched to the previous revision (1) of Flask SocketIO server, and it started working properly again.
- We switched back to the newest revision (2), and it stopped working.
- We switched back to the previous revision (1) again, and it started working properly again.
- However, after a while, when we switched back to the newest revision (2), it started working again.
We checked the GCP logging explorer, around the last time the browser was able to receive websocket data, and found an error log: ERROR:socketio:Cannot publish to Redis... retrying
We didn't see Cannot publish to Redis... giving up
When the server stopped working, the log from Flask SocketIO (also Python-SocketIO and Python-EngineIO that it uses) only showed emitting event "message" to room_1 [/some-namespace]
, which is from Python-SocketIO.
In the logs from Python-SocketIO, there was no pubsub message: emit
found, which is generated when it receives a message from Redis in the _thread function of the PubSubManager class.
(If more information is needed, please let me know)