3

I have been getting this strange error since a week now, here is the stack trace

 ERROR (redisson-netty-1-4) [DNSMonitor(operationComplete:98)] Unable to resolve redis.***********.cache.amazonaws.com java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at io.netty.resolver.dns.DnsNameResolver.doResolveCached(DnsNameResolver.java:613)
at io.netty.resolver.dns.DnsNameResolver.doResolve(DnsNameResolver.java:593)
at io.netty.resolver.dns.DnsNameResolver.doResolve(DnsNameResolver.java:527)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)

The application is a springboot API using redisson client to connect to elasticache. Though i see these errors, the API is running fine without any errors. Haven't been able to get any clues about this online too. Anyone else here faced something similar?

I also this error in logs followed by the above error

org.redisson.client.RedisTimeoutException: Redis server response timeout (3000 ms) occured for command: (HGET) with params: [packagesCache, PooledUnsafeDirectByteBuf(ridx: 0, widx: 3, cap: 256)] channel: [id: 0xdfd44ac3, L:/10.0.2.206:42857 - R:redis.kl3ise.0001.use1.cache.amazonaws.com/10.0.1.234:6379]
    at org.redisson.command.CommandAsyncService$11.run(CommandAsyncService.java:682)
    at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
    at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
    at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
    at java.lang.Thread.run(Thread.java:748)

Adding additional logs

Request for schedule for student with access key ab947-cf32-4965-ab06-36d4e904899don date 2018-02-14org.redisson.client.RedisTimeoutException: Redis server response timeout (60000 ms) occured for command: (HEXISTS) with params: [analyzedStudyPlanCache, PooledUnsafeDirectByteBuf(ridx: 0, widx: 9, cap: 256)] channel: [id: 0xe4dc90da, L:/10.0.2.206:56685 – R:redis.kl3ise.0001.use1.cache.amazonaws.com/10.0.1.234:6379]
Bhargav
  • 697
  • 3
  • 11
  • 29

1 Answers1

2

The log is already giving a hint Unable to resolve redis.***********.cache.amazonaws.com.

First try to check if your instance can resolve (find where the Redis server is), you can do by simply doing this:

$ host redis.****.cache.amazonaws.com

or a simple ping redis.****.cache.amazonaws.com you may not get ping responses but at least it should resolve the domain (get back some IP addresses)

If you don't get anything back, could be happened that the elasticache instance has just been created and is not jet propagated so that's why you are getting do entries in the logs, if the instance has been up and running for a while then check your DNS resolvers are properly set. Just as an extra test you could try

$ dig @8.8.8.8 redis.***cache.amazonaws.com +short

This will use google public DNS and if you do get an answer from that query, the problem is in your defined nameservers, just check /etc/resolv.conf

If you could resolve the domain then you could next try to check if the elasticache/redis instance is up and running and that you can reach it from your instance, for doing this via terminal you could do something like:

redis-cli -h redis.***.cache.amazonaws.com

In case is not using the default port 6379 you could use

redis-cli -h redis.***.cache.amazonaws.com -p XXXX 

In where XXXX is the configured port.

In case you don't have the redis-cli command you could give a try to telnet, for example:

telnet redis.***.cache.amazonaws.com 6379

If you are sure that the instance is up and running and can resolve the name but still can't connect, check the security groups from AWS side, probably is blocked.

nbari
  • 25,603
  • 10
  • 76
  • 131
  • this is an old redis cluster that has been running since a while, so the instance going down is a less probable event, i couldn't quite understand the thing you mentioned about resolvers, isn't that done default via the API? – Bhargav Mar 05 '18 at 09:44
  • WIth resolvers I mean if the instance/server running your code can resolve DNS the name for your Redis cluster since from the logs you post seems it can't, also could be probably to some security groups – nbari Mar 05 '18 at 09:52
  • That cannot be true but, it works well for say 5-6 days, and then suddenly these errors show up and things are back to normal is API is restarted – Bhargav Mar 06 '18 at 05:16
  • I would suggest login to the instance where your code lives and check that you can reach/connect to the Redis server, once you validate that is working, monitor your logs and check if you still getting the error. What is a fact is that for some reason your code wasn't available to connect to Redis it could also be due to some network transients. If you still get the error, check the server name, maybe you are trying to connect to an unexistent server, check for some typos, etc. – nbari Mar 06 '18 at 08:18
  • Thanks so much for your time but unfortunately, none of it worked, Typo is ruled out since this is configured into properties file and as i said earlier, it works well for some days. For instance there has been no occurrence of this error since i posted this question. – Bhargav Mar 06 '18 at 13:35