0

I have a test application (Spring Boot 2.7.8) that uses ActiveMQ Artemis 2.27.1 as a messaging system. I have a 6 node cluster split into 3 live/backup pairs. Load balanced using ON_DEMAND with a redistribution delay of 2000.

The application creates a connection factory specifying all 3 live nodes and creates a withHA connection factory.

I have a generator class that publishes messages to a single queue. There is a consumer of that queue which replicates this message to 3 different queues. I am aware of topics and wish to move there eventually, but I am modeling an existing solution which does this sort of thing now.

Testing shows that I publish a message and consume it, it publishes to the other 3 queues but only consumes from 2 of them despite all having listeners. Checking the queues after execution shows that it has sent messages to the queue. This is consistent over several runs, the same queue is never consumed whilst I am generating 'new' events.

If I disable the initial generation of new messages and just rerun, the missing queue is then drained by its listener.

This feels like when the connections are made, this queue has a publisher on one node and the consumer on another and the redistribution is not happening. Not sure how I can prove this or why, if the publishing node does not have consumers, it is not redistributing to the consumer.

Connection factory bean

    @Bean
    public ActiveMQConnectionFactory jmsConnectionFactory() throws Exception {
        HashMap<String, Object> map1 = new HashMap<>();
        map1.put("host", "192.168.0.10");
        map1.put("port", "61616");
        HashMap<String, Object> map2 = new HashMap<>();
        map2.put("host", "192.168.0.11");
        map2.put("port", "61617");
        HashMap<String, Object> map3 = new HashMap<>();
        map3.put(TransportConstants.HOST_PROP_NAME, "192.168.0.12");
        map3.put(TransportConstants.PORT_PROP_NAME, "61618");

        TransportConfiguration server1 = new TransportConfiguration(NettyConnectorFactory.class.getName(), map1);
        TransportConfiguration server2 = new TransportConfiguration(NettyConnectorFactory.class.getName(), map2);
        TransportConfiguration server3 = new TransportConfiguration(NettyConnectorFactory.class.getName(), map3);
        ActiveMQConnectionFactory connectionFactory = ActiveMQJMSClient.createConnectionFactoryWithHA(JMSFactoryType.CF, server1, server2, server3);

ActiveMQJMSClient.createConnectionFactoryWithoutHA(JMSFactoryType.QUEUE_CF, server1);
        connectionFactory.setPassword(brokerPassword);
        connectionFactory.setUser(brokerUsername);

        return connectionFactory;
    }

Listener factory bean

    @Bean
    public DefaultJmsListenerContainerFactory jmsQueueListenerContainerFactory() throws Exception {
        DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
        factory.setConnectionFactory(jmsConnectionFactory());
        //factory.setConcurrency("4-10");
        factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
        factory.setSessionTransacted(true);

        return factory;
    }

This handler listens to the initial published queue and splits

@Slf4j
@Component
@RequiredArgsConstructor
public class TransactionManagerListener {

    private final JmsTemplate jmsTemplate;

    /**
     *
     * Handle the ItemStatsUpdate event
     *
     * @param data - Event details wrapper object
     * @throws RuntimeException that triggers a retry for that item following the backoff rules in the retryable
     */
    @JmsListener(destination = "NewItem", containerFactory = "jmsQueueListenerContainerFactory")
    public void podA(Session session, Message message, String data) throws RuntimeException {
        log.info("TML {}!", data);
        sendItemOn(data);
    }

    private void sendItemOn(String data) {
        jmsTemplate.convertAndSend("Stash", data);
        jmsTemplate.convertAndSend("PCE", data);
        jmsTemplate.convertAndSend("ACD", data);
    }
}

Extract from broker.xml. All nodes are slightly different to hook up the differnt live servers and their backup

      <connectors>
          <connector name="live1-connector">tcp://192.168.0.10:61616</connector>
          <connector name="live2-connector">tcp://192.168.0.11:61617</connector>
          <connector name="live3-connector">tcp://192.168.0.12:61618</connector>
          <connector name="back1-connector">tcp://192.168.0.13:61619</connector>
          <connector name="back2-connector">tcp://192.168.0.10:61620</connector>
          <connector name="back3-connector">tcp://192.168.0.11:61621</connector>
      </connectors>
      
      <cluster-user>my-cluster-user</cluster-user>
      <cluster-password>my-cluster-password</cluster-password>
      <cluster-connections>
          <cluster-connection name="my-cluster">
              <connector-ref>live2-connector</connector-ref>
              <message-load-balancing>ON_DEMAND</message-load-balancing>
              <static-connectors>
                  <connector-ref>live1-connector</connector-ref>
                  <connector-ref>live3-connector</connector-ref>
                  <connector-ref>back2-connector</connector-ref>
                  <!--
                  <connector-ref>back1-connector</connector-ref>
                  <connector-ref>back3-connector</connector-ref>
                  -->
              </static-connectors>
          </cluster-connection>
      </cluster-connections>

      <ha-policy>
          <replication>
              <master>
                <group-name>gloucester</group-name>
                <check-for-live-server>true</check-for-live-server>
              </master>
          </replication>
      </ha-policy>

As you can see form the commented out concurrency setting I have tried to tweak the threads and consumers available in the listener factory but it made no difference.

Justin Bertram
  • 29,372
  • 4
  • 21
  • 43
Ian Cox
  • 31
  • 4
  • I feel like you need to simplify this down a bit to get a better understanding of where the problem might lie. For example, does this same problem happen if you just run against a _single_ broker (i.e. no cluster, no backups)? If not, what about just a cluster of 2 nodes? As you verify simpler use-cases you can keep adding complexity until you reproduce the problem and then you know the problem is somehow related to the last bit of complexity you added. – Justin Bertram Feb 03 '23 at 16:33
  • FWIW, you can simplify your `jmsConnectionFactory()` method a fair bit by simply using a URL, e.g. `return new ActiveMQConnectionFactory("(tcp://192.168.0.10:61616,tcp://192.168.0.11:61617,tcp://192.168.0.12:61618)?ha=true&reconnectAttempts=-1&user=" + brokerUsername + "&password=" + brokerPassword);` – Justin Bertram Feb 03 '23 at 16:40
  • Thanks for the advice and info on the factory bean. Just did a further test and swapped the order of the 3 convertAndSend calls. Its always the 'middle' one that shows the symptoms. So the first run it was PCE that failed, swapped that with ACD and PCE comes out but ACD is missing? I will reduce the test env down. I have a local instance so will try that first with just a single publish on the first queue. – Ian Cox Feb 03 '23 at 19:01
  • So, stripped back the configuration and built up. 1 pair of live/backup worked well. Created another pair and tested separately, again worked well. Joined the 4 nodes using static cluster connections and all was well. Adding a 3rd pair with all nodes statically linked and it failed???? Removed the HA settings so I had a 6 node symmetric cluster and all is well again? I have read that taking the broker.xml from one node and copying it around the other nodes work but for the static cluster specification, I cannot see how that works unless I have missed something. – Ian Cox Feb 05 '23 at 11:04
  • Is there any way to have 3 HA pairs configured as a load balanced cluster? I am trying to model how this would look on a 3 DC setup where UDP discovery could not be used. – Ian Cox Feb 05 '23 at 11:05
  • As far as I'm aware 3 HA pairs should load-balance just like 2 HA pairs or any other number of pairs. If you suspect there's a bug please [report it](https://activemq.apache.org/issues) and include a way to reproduce the behavior you're observing. – Justin Bertram Feb 07 '23 at 19:58
  • Logged as ARTEMIS-4166 – Ian Cox Feb 10 '23 at 15:33

1 Answers1

0

This came down to network configuration. It is necessary to run the cluster on the local swarm network but provide ingress connectors for the JMS client.

      <connectors>
          <connector name="live1-connector">tcp://192.168.0.10:61616</connector>
          <connector name="live2-connector">tcp://192.168.0.11:61617</connector>
          <connector name="live3-connector">tcp://192.168.0.12:61618</connector>
          <connector name="back1-connector">tcp://192.168.0.13:61619</connector>
          <connector name="back2-connector">tcp://192.168.0.10:61620</connector>
          <connector name="back3-connector">tcp://192.168.0.11:61621</connector>

          <connector name="live1-connector">tcp://live1:61616</connector>
          <connector name="live2-connector">tcp://live2:61616</connector>
          <connector name="live3-connector">tcp://live3:61616</connector>
          <connector name="back1-connector">tcp://back1:61616</connector>
          <connector name="back2-connector">tcp://back2:61616</connector>
          <connector name="back3-connector">tcp://back3:61616</connector>
      </connectors>
      
      <cluster-user>my-cluster-user</cluster-user>
      <cluster-password>my-cluster-password</cluster-password>
      <cluster-connections>
          <cluster-connection name="my-cluster">
              <connector-ref>live3-connector</connector-ref>
              <static-connectors>
                  <connector-ref>back3-connector</connector-ref>
                  <connector-ref>live1-connector</connector-ref>
                  <connector-ref>back1-connector</connector-ref>
                  <connector-ref>live2-connector</connector-ref>
                  <connector-ref>back2-connector</connector-ref>
              </static-connectors>
          </cluster-connection>
      </cluster-connections>
Ian Cox
  • 31
  • 4