I have the following architecture for integration in my system:
2 Mule instances (not in a Mule cluster, because of CE version) and 2 activemq instances configured in master-slave using shared database lock.
There is a flow in Mule which has as inbound a queue where another flow writes a message. After reading this message from that queue it makes some processing and calls another system, which could return some controlled exceptions and in that case message should be redelivered.
I´ve configured this redelivery policy:
<spring:bean id="redeliveryPolicy" class="org.apache.activemq.RedeliveryPolicy">
<spring:property name="maximumRedeliveries" value="5" />
<spring:property name="initialRedeliveryDelay" value="30000" />
<spring:property name="maximumRedeliveryDelay" value="7200000" />
<spring:property name="useExponentialBackOff" value="true" />
<spring:property name="backOffMultiplier" value="4" />
<spring:property name="useCollisionAvoidance" value="true"/>
</spring:bean>
This is a redelivery delay of 2h in last retry. As this is a long interval we don´t want to block processing of new incoming messages so we algo configured 'nonBlockingRedelivery' = true.
Each instance of Mule starts 4 consumers for this queue, so 8 consumers in total.
The problem we are facing is that in some load tests where we send like 500 messages that generate a controlled exception in target system (which forces redelivery) at some point, processing performance decreases a lot. We know that as redelivery delay increases, there are intervals where no processing at all is normal, but when redeliveries should happen it only processes one redelivery from time to time.
We noticed that if we stop one of the instances of Mule, the other one starts to process a reasonable number of redeliveries (like if they were blocked until that moment) and after a while same scenario of low performance repeats. When we start that instance recently stopped, it doesn´t start processing any redelivery, but if we stop the other one (the one who didnt stop at first time), it processes "enqueued" redeliveries. Although I´m not sure I think this is because of redelivey flag is in memory, so when one of the nodes stops, it "releases" its messages and they are processed by the other instance like "fresh" new messages.
In one of our tests we wait for some hours and all messages were finally processed (they all reached maximum retries limit).
So it seems a matter of blocking but we are kinda stuck.
Some configurations I´ve tried without success:
useCollisionAvoidance = true, in order to avoid redelivery collusion (maybe I should increase default collision factor).
I´ve changed prefetch max pending = 1 (1000 is default value).
We know this is a extreme scenario (all calls to target system fail) but it could happen if that system is down so we want to be prepared against it.
Any help?
I attach a dump of mule threads related with ActiveMQ of one of the instances when this "blocking scenario" happened:
"ActiveMQ InactivityMonitor Worker" daemon prio=10 tid=0x00007fb79863c800 nid=0x1520 waiting on condition [0x00007fb77aceb000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000078dc457c0> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"ActiveMQConnection[ID:HSLPRELABMUL801-45728-1456487568559-1:1] Scheduler" daemon prio=10 tid=0x00007fb7c0226000 nid=0x1479 in Object.wait() [0x00007fb7a13a9000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x000000078d39f060> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"ActiveMQ InactivityMonitor WriteCheckTimer" daemon prio=10 tid=0x00007fb790002800 nid=0x1466 in Object.wait() [0x00007fb77abea000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x000000078d7aaa78> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"ActiveMQ Transport: tcp:///10.237.89.167:61616@45020" prio=10 tid=0x00007fb78801d000 nid=0x1464 runnable [0x00007fb77adec000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
at org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:609)
at org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
at org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:594)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:267)
at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)
at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)
at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
at java.lang.Thread.run(Thread.java:745)
"ActiveMQ InactivityMonitor ReadCheckTimer" daemon prio=10 tid=0x00007fb78801b800 nid=0x1462 in Object.wait() [0x00007fb77aeed000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x000000078d7aad88> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
Thank you