We have a five node Cassandra cluster with replication factor 3. We are experiencing a lot of Read Timeouts in our application. When we checked tpstats on each Cassandra node, we see that three of the nodes have a lot of Read request drops and a high CPU utilisation, whereas on the other two nodes Read request drops are zero and CPU utilisation is moderate. Note that the total number of Read requests on all servers are almost same.
After taking thread dump we found out that the reason for high CPU utilisation is that Parallel GC is running a lot on the three nodes compared to the other two nodes, which is causing CPU utilisation to go high. What we are not able to understand is why GC should be running more on three nodes and less on two nodes, when the distribution of our partition key and our queries is almost uniform.
Cassandra version is 2.2.3.