Keeping TCP connections alive to track which clients are online

Question

I am developing an application where a server needs to stay in touch with lots of simple IoT devices. Nearly no information exchange is needed between the server and each device, but devices need to stay online and reachable by the server 24h. At some point (that happens very rarely) the server needs to be able to get in touch with one of the devices and exchange some messages: it is crucial, however, that those devices are reachable in a matter of a very short time.

This means that I need those client devices to be somehow continuously connected. Now, I wonder: is it feasible to just connect those devices via TCP and keep those connections alive to be always ready to exchange messages?

I have tried to read around and I always read the same answer: it depends on your implementation, since it is very likely that your message exchanging and processing will be the bottleneck rather than keeping those TCP connections alive. Now, this is not really my case, since I just need to exchange a very very limited amount of information every lots of time.

So is it reasonable to just keep those clients connected? Or should I devise a more efficient method? For example, how much bandwidth is required to just keep alive a TCP connection without any data exchange? And does this require a significant amount of memory or CPU?

I implemented a simple C++ program that sends UDP keep alives to my server every some seconds: as per my benchmarks, this can scale up to several millions online devices without any problem, even on a reasonably limited server. Will TCP perform worse than that?

RFC5482 contains useful background info on how TCP operates timeouts, your main problem would be that you don't know which practical implementations of TCP sit in the way of your devices. A TCP server has a theoretical max number of 65535 ports to allocate per ip address, which creates a design issue. TCP performance is below UDP, but you need to benchmark it too for your application. Modern hardware is pretty powerful, I believe you'd likely run into other scalability issues before reaching a performance cap. https://tools.ietf.org/html/rfc5482 — ErikE, Aug 14 '15 at 17:42

score 3 · Answer 1 · answered Aug 14 '15 at 17:14

As for my understanding of TCP, asserting "Keeping TCP connections alive" is misleading, as there is no TCP-protocol-specific mechanism dealing with timeout, when referred to ESTABLISHED connections. I mean: once established, they can last forever, until a RESET, a FIN or a timeout in receiving an ACK (...following some transmission to be ACKnowledged, in this last case) happens.

As for my experience, 100% of "suddenly broken due to idle timeout" sort of issues, depends on some intermediary router/firewall, along the routing path between the two communicating hosts. I mean: as the firewall tipically is a "statefull" firewall, it keeps track of connections it is firewalling/managing. As such, every connection it need to track means some degree of system resources (of the firewall, I mean) to be consumed. Also, the firewall knows perfectly which of the managed connections are "working" and which one, viceversa, are "idle", due to the very nature of the firewall itself (it's a stateful firewall!). As such, lots (all?) of the firewall implementations have a timeout defined and if the managed-connections are idle for such a timout value, the firewall send a reset to the both ends (...of the TCP connection) and frees its own resources.

Based on your question, I bet that the TCP connection will be opened by your IoT device (acting as a client) versus your controlling-server (the TCP server). Hence... LOTS, if not ALL, of the ADSL home router that will NAT your IoT device traffic, will surely act as described.

This, at least, based on my own experience.

But as I'm not Jon Postel, please don't blame me if I'm wrong :-)

As a side note: you wrote "...LOTS of simple IoT devices...". Please keep in mind that there is a very hard-limit in the number of concurrent TCP connections you can handle with your one-single-big server as.... TCP "port" is a 16bit values. So, for each IP address, you cannot exceed (by TCP intrinsic design) 64K connections. How this problems can be solved, it's out of scope, in the context of this question.

Finally, let me add that I really see no problem in implementing a sort of heartbeat protocol between you IoT device and the managing server/application. It can be implemented to be very "network-friendly", with no impact in terms of bandwidth and with lots of advantages, in terms of manageability/control.

score 2 · Answer 2 · answered Aug 14 '15 at 17:09

Your idea is fine; in fact, modern mobile devices use the exact same approach for their notifications, they maintain a permanent connection to the OS developer's server and that server pushes notifications down that connection (third-party app developers send notifications to the OS's developer which in turn relays them to the appropriate mobile device).

An alternative method can be used if your devices are guaranteed to have a publicly routable IP and are able to listen on a socket; in that case the devices will notify your server each time their IP changes but whenever your server needs to deliver some data to the device, the server will connect to the device's socket and send it the data. That way your server won't need to handle any load other than updating every device's IP address in its database and occasionally connecting to a device and sending it data.

About TCP vs UDP, I believe TCP is better for guaranteeing reachability of a device - with TCP as long as the connection is open you have some guarantee that the device is still there (otherwise the connection would've timed out). With UDP, you're just throwing packets in the air without even knowing if they made it to the destination (unless you implement your own keep-alive, connection management and retransmission system, but then why reinvent the wheel when you have an already solid and popular implementation called TCP?). Also you have to think about firewalls and NAT, with TCP once the connection is established you are sure that whatever you send makes it to the destination, while with UDP you can't be so sure and have to punch holes with varying degrees of success.

Keeping TCP connections alive to track which clients are online

2 Answers2