We previously use to store User Sessions in our DB table (Postgres RDS)
We decided to migrate User Session from DB to Redis and made changes in our application
For Redis, we decided to use Elastic Cache service with 1 shard, 2 nodes (primary + replica) and Multi AZ enabled
On the Live environment, things were pretty smooth till a point where number of session crossed 0.5 million (around 3 PM)
At this juncture, Redis Node suddenly stopped responding resulting in complete crash of our Production environment (too many threads waiting for connection)
We had to reboot our instance to resume the service
This happened again later in the evening around 9 PM
The exception generated at Java end (spring)
2016/11/22 09:19:31.749 <a href="http-nio-8080-exec-780">http-nio-8080-exec-780</a> <a href="https://forums.aws.amazon.com/">ERROR</a> org.apache.catalina.core.ContainerBase.<a href="https://forums.aws.amazon.com/">Tomcat</a>.<a href="https://forums.aws.amazon.com/">localhost</a>.[/].<a href="https://forums.aws.amazon.com/">dispatcherServlet</a> - Servlet.service() for servlet <a href="https://forums.aws.amazon.com/">dispatcherServlet] in context with path [</a> threw exception
org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:140) ~<strike>spring-data-redis-1.4.2.RELEASE.jar!/:1.4.2.RELEASE</strike>
at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:229) ~<strike>spring-data-redis-1.4.2.RELEASE.jar!/:1.4.2.RELEASE</strike>
....
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) <strike>na:1.7.0_72</strike>
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) <strike>tomcat-embed-core-8.0.20.jar!/:8.0.20</strike>
at java.lang.Thread.run(Thread.java:745) <strike>na:1.7.0_72</strike>
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:42) ~<strike>jedis-2.5.2.jar!/:na</strike>
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:84) ~<strike>jedis-2.5.2.jar!/:na</strike>
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:10) ~<strike>jedis-2.5.2.jar!/:na</strike>
at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:133) ~<strike>spring-data-redis-1.4.2.RELEASE.jar!/:1.4.2.RELEASE</strike>
... 55 common frames omitted
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: connect timed out
at redis.clients.jedis.Connection.connect(Connection.java:150) ~<strike>jedis-2.5.2.jar!/:na</strike>
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:71) ~<strike>jedis-2.5.2.jar!/:na</strike>
at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1783) ~<strike>jedis-2.5.2.jar!/:na</strike>
at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:65) ~<strike>jedis-2.5.2.jar!/:na</strike>
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:836) ~<strike>commons-pool2-2.2.jar!/:2.2</strike>
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:434) ~<strike>commons-pool2-2.2.jar!/:2.2</strike>
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) ~<strike>commons-pool2-2.2.jar!/:2.2</strike>
at redis.clients.util.Pool.getResource(Pool.java:40) ~<strike>jedis-2.5.2.jar!/:na</strike>
... 58 common frames omitted
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method) ~<strike>na:1.7.0_72</strike>
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ~<strike>na:1.7.0_72</strike>
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ~<strike>na:1.7.0_72</strike>
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ~<strike>na:1.7.0_72</strike>
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~<strike>na:1.7.0_72</strike>
at java.net.Socket.connect(Socket.java:579) ~<strike>na:1.7.0_72</strike>
at redis.clients.jedis.Connection.connect(Connection.java:144) ~<strike>jedis-2.5.2.jar!/:na</strike>
... 65 common frames omitted
We still don't know the root cause of this?
Can someone point us in the right direction and help us in identifying the root cause and solution of this problem?