Optimizing Jetty for heartbeat detection of thousands of machines?

Question

I have a large number of machines (thousands and more) that every X seconds would perform an HTTP request to a Jetty server to notify they are alive. For what value of X should I use persistent HTTP connections (which limits number of monitored machines to number of concurrent connections), and for what value of X the client should re-establish a TCP connection (which in theory would allow to monitor more machines with the same Jetty server).

How would the answer change for HTTPS connections? (Assuming CPU is not a constraint)

This question ignores scaling-out with multiple Jetty web servers on purpose.

Update: Basically the question can be reduced to the smallest recommended value of lowResourcesMaxIdleTime.

score 1 · Answer 1 · answered Dec 21 '12 at 10:52

1

I would say that this is less of a jetty scaling issue and more of a network scaling issue, in which case 'it depends' on your network infrastructure. Only you really know how your network is laid out and what sort of latencies are involved in order to come up with a value of X.

From an overhead perspective the persistent HTTP connections will of course have some minor effect (well I say minor but depends on your network) and the HTTPS will again have a larger impact....but only from a volume of traffic perspective since you are assuming CPU is not a constraint.

So from a jetty perspective, it really doesn't need to be involved in the question, you seem to ultimately be asking for help optimizing bytes of traffic on the wire so really you are looking for the best protocol at this point. Since with HTTP you are having to mess with headers for each request you may be well served looking at something like spdy or websocket which will give you persistent connections but are optimized for low round trip network overhead. But...they seem sort of overkill for a heartbeat. :)

answered Dec 21 '12 at 10:52

jesse mcconnell

7,102
1
22
33

What about the overhead of opening a TCP connection vs the limit of concurrent number of TCP connections? Doesn't websocket assume an open TCP connection? Perhaps I fail to understand where the limitation of concurrent TCP connections come from. Can Jetty handle more TCP connection just because there is less traffic (Websocket vs HTTP)? – itaifrenkel Dec 21 '12 at 17:36
what I am ultimately saying is that given the concerns stated, jetty itself is unlikely to be the bottleneck you need to worry about or tune. i mention websocket because you mentioned persistent http connections and websocket is likely to have less bytes transfered overhead then persistent http connections when you take into account headers on each request and it would make a fine heartbeat protocol in the case I think you describe. – jesse mcconnell Dec 21 '12 at 17:45
I am not sure I want persistent connections. Jetty comes with a predefined connection timeout after-which the persistent connection drops and the client needs to reestablish it. The client does not have to wait for the Jetty to close the connection, it can do it explicitly (I suppose). Should it? Under what values of X should it do it? For example, if X is 1 minutes, then I would have to use persistent connections, in which case Websocket is more optimal than HTTP. But if X is 1 second, then I would prefer to disconnect the TCP connection, and let other machines connect to the server. – itaifrenkel Dec 21 '12 at 17:53
I don't think there is a hard and fast recommended X value for what you're looking for...it is a tradeoff that you have to decide given your setup and what granularity of uptime your heartbeat is trying to determine – jesse mcconnell Dec 21 '12 at 17:59
Ok, in that case I would ask about the default timeout jetty is preconfigured and ask if it fits my use case. I need to have a rough guestimate. – itaifrenkel Dec 21 '12 at 18:20
timeouts are generally around 30s of inactivity, which is generally a timeframe past how long someone will wait for a page to load with no feedback and long enough to reasonably guess that with no activity something has probably happened network wise – jesse mcconnell Dec 21 '12 at 18:31
Great. Is there any recommended value for lowResourcesMaxIdleTime ? How low can it be set without impacting TCP connection overhead? – itaifrenkel Dec 21 '12 at 21:23
there is no 'recommended' value for that beyond the default, it depends on your specific needs and experiences. In other words, it is one knob useful in tuning an existing setup that you can tweak...but unless you are having _actual_ issues this sort of connector tuning is premature optimization. Choose your protocol based on actual needs (heartbeat is super simple ping/pong packets in websocket). – jesse mcconnell Dec 26 '12 at 11:06
I developed software which monitored over 40k concurrent clients in one machine, so I can't see the reason to switch from http to any other protocol. Ultimately it depends on machine strength, but it is definitely possible. We weren't using Jetty though, but I am pretty sure it's one of the fastest servlet containers out there so you should be ok... – TheZuck Dec 28 '12 at 19:18
I just toss out this blog by Simone Bordet, a fellow jetty and cometd developer regarding some performance graphs of http and websocket with cometd (spdy, which is the basis for http/2.0, results are better still). http://webtide.intalio.com/2011/09/cometd-2-4-0-websocket-benchmarks/ – jesse mcconnell Dec 28 '12 at 20:38
also we have had 200 connections pushing 20k messages/sec and could easily have pushed that up higher...and so spdy gets some showing on this thread here: http://webtide.intalio.com/2012/10/spdy-push-demo-from-javaone-2012/ – jesse mcconnell Dec 28 '12 at 20:56
#TheZuck - Did all 40k clients held a persistent TCP connection with the server? If not, what was the server minimum auto-disconnect timeout. – itaifrenkel Jan 03 '13 at 11:30

score 1 · Answer 2 · answered Dec 30 '12 at 12:39

1

How about just make them request at different time? Assume first machine request, then you pick a time to response to that machine as the next time to heart beat of that machine (also keep the id/time at jetty server), the second machine request, you can pick another time to response to second machine.

In this way, you can make each machine perform heart beat request at different time so no concurrent issue.

You can also use a random time for the first heart beat if all machines might start up at the same time.

answered Dec 30 '12 at 12:39

benbai123

1,423
1
11
11

Given enough machines I assume the spread would be good enough without orchestrating it. Even if they do collide, the http client can retry. Notice however that you are assuming that each machine creates a new connection. Which is what I am asking. Should it? What is the overhead? – itaifrenkel Jan 03 '13 at 11:27

Optimizing Jetty for heartbeat detection of thousands of machines?

2 Answers2