1

I have a question regarding spring cloud stream and RabbitMQ. I don't know why, but I keep having this issue come up every so often where we lose connectivity to rabbit for a few seconds and reestablish a connection.

Here is the Stack Trace:

[container-1] WARN org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer - Consumer raised exception, processing can restart if the connection factory supports it
com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:743)
at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:733)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:573)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290)
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95)
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:542)
... 1 more

Here are the application properties we are using:

spring.rabbitmq.host=rabbit_host
spring.rabbitmq.port=rabbit_port
spring.rabbitmq.username=rabbit_user
spring.rabbitmq.password=rabbit_password
spring.rabbitmq.ssl.enabled=true
rabbit.queues=someQueue
spring.cloud.stream.bindings.output.content-type=application/json
rabbit.enable-retry=true
spring.cloud.stream.defaultBinder=rabbit
spring.cloud.stream.bindings.input.binder=kafka
spring.cloud.stream.bindings.output.binder=rabbit
spring.rabbitmq.requested-heartbeat=60

And we are using the AggregateApplicationBuilder to orchestrate our stream together.

public static void main(String[] args) {
new AggregateApplicationBuilder(SpringApplication.run(OurApplication.class)).from(RabbitSourceConfiguration.class).from(ourTransformer.class).to(ourSink.class).run(args);
}

Normally we would be ok with having some intermittent connectivity issues if they are able to resolve themselves, but here is the major problem for us. In our production environment we actually end up losing connection permanently after some time. Here is the following stack trace for when that issue happens.

Caused by: com.rabbitmq.client.PossibleAuthenticationFailureException: Possibly caused by authentication failure
at com.rabbitmq.client.impl.AMQConnection.start(AMQConnection.java:342) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:813) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:725) ~[amqp-client-3.6.3.jar!/:na]
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:296) ~[spring-rabbit-1.6.1.RELEASE.jar!/:na]
... 12 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:67) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:37) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:367) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:234) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:212) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.impl.AMQConnection.start(AMQConnection.java:327) ~[amqp-client-3.6.3.jar!/:na]
... 15 common frames omitted
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290) ~[na:1.8.0_66]
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139) ~[amqp-client-3.6.3.jar!/:na]
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:542) ~[amqp-client-3.6.3.jar!/:na]
... 1 common frames omitted

We thought this could be because the rabbit server or haproxy is dumping our connection after being idle for some time, but then I added the "requested-heartbeat", and we are still getting the issue.

Anyone have any ideas or clues as to why or what is causing this issue? Any help or direction would be greatly appreciated.

Thanks!

Matt

Mattnv92
  • 419
  • 1
  • 4
  • 9
  • Look in the server log to see if it provides any more information as to why the connection is closed. – Gary Russell Apr 20 '17 at 15:10
  • `=INFO REPORT==== 17-Apr-2017::05:07:22 === accepting AMQP connection <0.2257.672> (**ips here**)` `=WARNING REPORT==== 17-Apr-2017::05:08:07 === closing AMQP connection <0.12729.672> (**ips here**): client unexpectedly closed TCP connection` All of our rabbit server logs are filled with only those two messages from around the time the disconnection was reported, so I can't really gather anything meaningful from that. – Mattnv92 Apr 20 '17 at 15:48
  • @GaryRussell, could we be facing the same issue as this? http://stackoverflow.com/questions/37009897/rabbitmq-channel-shutdown-connection-error-springxd-closes-rabbitmq-connecti/37010665 – Mattnv92 Apr 20 '17 at 15:53
  • I don't think so; you may need to run a network trace. I don't know how you can get into a situation where the connection is never recovered, unless the listener thread is stuck somewhere. A thread dump should help determine that. – Gary Russell Apr 20 '17 at 17:07
  • @GaryRussell Darn ok. We'll try to take a thread dump next time this happens. Just to rule this out, I know our production rabbit is sitting on v3.6.0 and our amqp-client jar is on 3.6.3 under Spring-Rabbit 1.6.1. Would that cause any problems for us? Thanks so much for the help Gary, really appreciate it! – Mattnv92 Apr 20 '17 at 17:31
  • The current 1.6.x version is 1.6.9. However, 1.6.1 does include the increase to the cache size (to avoid that other problem you referenced) to 25. If you have more than 25 threads concurrently publishing you might want to increase that some more. There have also been some fixes that might apply if you are dynamically adding/removing queues to running containers. But I don't think any of these would cause the connection close you are seeing. It definitely smells like the proxy dropping the connection. – Gary Russell Apr 20 '17 at 18:03

0 Answers0