Here's a scenario: I have a 'data-feed' - a REST/JSON service that updates periodically (let's say - every 10 seconds or so), and if a change in the data set occurs - then all subscribed listeners need to be updated.
It's currently implemented using long-polling over HTTP, which is a technicality - but the main concept is that clients don't bother the server, and the server doesn't bother the clients - unless there's something to bother about. When there is something new, all clients get notified immediately. The technology consists of Java/Tomcat7, async IO (asyncResponse).
I think it works great: I can drive 10K concurrent sessions for ~ $0.07 per hour (AWS M3.Medium instance).
(Question - I think it works great, but I would like to hear some benchmark numbers to verify. Or in other words - do you think it's a good bang for the buck? please share !!)
If all my clients receive the same data set (the same JSON), is there a way I could optimize even more?
I'm thinking about IP V6 'multicast' - this would minimize my bandwidth consumption by orders of magnitude - but is this practical?
For supporting 1 million concurrent users, for example, assuming there's an update every 10 seconds, I would need to support 100K 'hits' (or responses) per second. If the response size is 10K, the bandwidth starts becoming a big issue here: 10K * 100K * 60 * 60 * 24 --> 86 Giga per 24h.
There isn't really a single, focused question here (besides IPv6) - I would like to hear your thoughts, experience, and alternative approaches - I hate re-inventing the wheel, and I'm sure that the collective wisdom out there far surpasses my own.
Thanks.