We have set up usergrid (2.1.0) with ES 1.7.5 and Cassandra 3.7, on a very big system: 12 machines for UG, 9 for cassandra and 9 for elasticsearch. All (virtual) machines have 16 cores and 32 Gig rams. However, even at 3000 concurrent users, es and c* servers go crazy and hit 100% cpu usage. When the es cpu peaks, we can not get the /roles collection, so users can not login. When c* cpu peaks, usergrid can not connect to c*, and simply mutes all http requests.
There are no iwoaits on disk or network.
Our application depends on usergrid queries, so we do heavy query request. But, I did not expect such cpu peak on the subsystems.
Any support is appreciated.