0

We have 2 servers: server1 and server2. Both servers are running in domain mode configuration of Wildfly 11. Here's the documentation for how we have configured both servers.

Let’s consider server1 domain node in our case.

Issue:

If 2 messages with same group ID arrives simultaneously on server1 and server2 they won't know to which consumer the message should be sent to. Therefore, the message ends up being processed by different consumers and sometimes the message which arrived first gets processed later which is not desirable. We would like to configure the system so that both nodes know each other that the message should be processed by which consumer.

Solution we tried:

Configured the server1 with group handler LOCAL and server2 with REMOTE. Now whenever the message arrives LOCAL group handler identifies that the consumer of that particular group ID is on which node and the message is picked accordingly.

This solution is valid until the server1 is running fine. However, if the server1 goes down messages are not processed. To fix this we added backup for messaging subsystem active-mq of server1 to server2 and similarly did the same for server2.

/profile=abc/subsystem=messaging-activemq/server=backup:add

(backup server would be added to both nodes as server1 is the domain node)

Also we added the same discovery-group, http-connector, broadcast-group to this backup server. We established cluster-connection for backup and live servers to be in the same group for server1 and server2.

/profile=abc/subsystem=messaging-activemq/server=default/ha-policy=replication-master:add(cluster-name=my-cluster,group-name=${livegroup},check-for-live-server=true)
/profile=abc/subsystem=messaging-activemq/server=backup/ha-policy=replication-slave:add(cluster-name=my-cluster,group-name=${backupgroup})

server1 is configured to read the below properties:

livegroup=group1
backupgroup=group2

server2 is configured to read the below properties:

livegroup=group2
backupgroup=group1

However, this solution does not seem to fix the failover condition and messages were not processed on other node when the live node with local group handler node was down. We get the below error on server2 when server1 shuts down:

[org.apache.activemq.artemis.core.server] (default I/O-3) AMQ222092: Connection to the backup node failed, removing replication now: ActiveMQRemoteDisconnectException[errorType=REMOTE_DISCONNECT message=null]
        at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:533)
        at org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$Listener.connectionDestroyed(NettyAcceptor.java:682)
        at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.channelInactive(ActiveMQChannelHandler.java:79)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:360)
        at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744)
        at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:612)
        at org.xnio.nio.WorkerThread.run(WorkerThread.java:479)

Please suggest any other approach altogether to handle the issue or how can we configure the scenario where the server with LOCAL group handler shuts down.

Justin Bertram
  • 29,372
  • 4
  • 21
  • 43
Garima Natany
  • 53
  • 1
  • 4

1 Answers1

0

The recommended solution for clustered grouping is what you have configured - a backup for the node with the LOCAL grouping-handler. The bottom line here is if there isn't an active node in the cluster with a LOCAL grouping-handler then a decision about what consumer should handle which group simply can't be made. It sounds to me like your backup broker simply isn't working as expected (which is probably a subject for a different question).

Aside from having a backup you might consider eliminating the cluster altogether so that you just have 1 broker or perhaps just a live/backup pair rather than 2 active brokers. Clusters are a way to improve overall message throughput using horizontal scaling. However, message grouping naturally serializes message consumption for each group which then decreases overall message throughput (perhaps severely depending on the use-case). It may be that you don't need the performance scalability of a cluster since you're grouping messages. Have you performed any benchmarking to determine your performance bottlenecks? If so, was clustering the proven solution to these bottlenecks?

Justin Bertram
  • 29,372
  • 4
  • 21
  • 43