1

As discussed in this question, I setup a cluster with embedded ActiveMQ on WildFly 25 in Kubernetes. The configuration for the ActiveMQ subsystem is as follows:

<subsystem xmlns="urn:jboss:domain:messaging-activemq:13.0">
  <server name="default">
    <security elytron-domain="ApplicationDomain"/>
    <cluster password="${jboss.messaging.cluster.password:xxxxxxx}"/>
    <statistics enabled="${wildfly.messaging-activemq.statistics-enabled:${wildfly.statistics-enabled:false}}"/>
    <security-setting name="#">
        <role name="guest" send="true" consume="true" create-non-durable-queue="true" delete-non-durable-queue="true"/>
    </security-setting>
    <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" 
                     page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000"/>
    <http-connector name="http-connector" socket-binding="http" endpoint="http-acceptor"/>
    <http-connector name="http-connector-throughput" socket-binding="http" endpoint="http-acceptor-throughput">
        <param name="batch-delay" value="50"/>
    </http-connector>
    <in-vm-connector name="in-vm" server-id="0">
        <param name="buffer-pooling" value="false"/>
    </in-vm-connector>
    <http-acceptor name="http-acceptor" http-listener="default"/>
    <http-acceptor name="http-acceptor-throughput" http-listener="default">
        <param name="batch-delay" value="50"/>
        <param name="direct-deliver" value="false"/>
    </http-acceptor>
    <in-vm-acceptor name="in-vm" server-id="0">
        <param name="buffer-pooling" value="false"/>
    </in-vm-acceptor>
    <jgroups-broadcast-group name="bg-group1" jgroups-cluster="activemq-cluster" connectors="http-connector"/>
    <jgroups-discovery-group name="dg-group1" jgroups-cluster="activemq-cluster"/>
    <cluster-connection name="my-cluster" address="jms" connector-name="http-connector" discovery-group="dg-group1" reconnect-attempts="10"  />
    <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
    <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
    <!-- our queues below here -->
    
    <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
    <connection-factory name="RemoteConnectionFactory" entries="java:jboss/exported/jms/RemoteConnectionFactory" 
                        connectors="http-connector" ha="true" block-on-acknowledge="true" reconnect-attempts="10"/>
    <pooled-connection-factory name="activemq-ra" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" 
                        connectors="in-vm" transaction="xa"/>
  </server>
</subsystem>

It seems to startup fine, when I query the pods for IP and cluster members with this simple PowerShell script:

    $x = 0
    $max =3
    do {
     $pod = "pod/myapp-" 
     $pod += $x  
     Write-Host $pod
     oc exec -it $pod -c myapp -- bash -c "ip addr show dev eth0 | grep global"
     oc exec -it $pod -c myapp -- bash -c "/myapp/wildfly/bin/jboss-cli.sh --connect --command='/subsystem=messaging-activemq/server=default/cluster-connection=my-cluster:get-nodes()' "
     $x++
    } while ($x -lt $max)

I see the following output, which is the name of the pod/container, the IP and the members of the cluster :

pod/myapp-0
    inet 172.22.14.63/24 brd 172.22.14.255 scope global eth0
{
    "outcome" => "success",
    "result" => {
        "138ef848-1c1d-11ee-b9ab-0a58ac160a3e" => "172.22.10.62/172.22.10.62:8080",
        "0a15e3c0-1c1d-11ee-9e42-0a58ac16083e" => "172.22.8.62/172.22.8.62:8080"
    }
}

pod/myapp-1
    inet 172.22.10.62/24 brd 172.22.10.255 scope global eth0
{
    "outcome" => "success",
    "result" => {
        "1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f" => "172.22.14.63/172.22.14.63:8080",
        "0a15e3c0-1c1d-11ee-9e42-0a58ac16083e" => "172.22.8.62/172.22.8.62:8080"
    }
}

pod/myapp-2
    inet 172.22.8.62/24 brd 172.22.8.255 scope global eth0
{
    "outcome" => "success",
    "result" => {
        "1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f" => "172.22.14.63/172.22.14.63:8080",
        "138ef848-1c1d-11ee-b9ab-0a58ac160a3e" => "172.22.10.62/172.22.10.62:8080"
    }
}

But after the first restart of the cluster, I see the following messages appear once:

18:50:53,752 WARN  [org.apache.activemq.artemis.core.server] (Thread-13 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222101: Bridge $.artemis.internal.sf.my-cluster.dcf19ee6-1c14-11ee-b0c7-0a58ac160e3e achieved 11 maxattempts=10 it will stop retrying to reconnect myapp-2
18:50:17,366 WARN  [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222107: Cleared up resources for session 0abdbca7-1c1d-11ee-b0c7-0a58ac160e3e   myapp-2
18:50:17,363 WARN  [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222107: Cleared up resources for session 0ab88c85-1c1d-11ee-b0c7-0a58ac160e3e   myapp-2
18:50:17,363 WARN  [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222061: Client connection failed, clearing up resources for session 0abdbca7-1c1d-11ee-b0c7-0a58ac160e3e    myapp-2
18:50:17,361 WARN  [org.apache.activemq.artemis.core.client] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ212037: Connection failure to /172.22.14.62:39870 has been detected: AMQ229014: Did not receive data from /172.22.14.62:39870 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] myapp-2
18:50:17,362 WARN  [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222061: Client connection failed, clearing up resources for session 0ab88c85-1c1d-11ee-b0c7-0a58ac160e3e    myapp-2
18:49:23,579 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-2
17:51:27,544 WARN  [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222101: Bridge $.artemis.internal.sf.my-cluster.31762214-1bff-11ee-bfbb-0a58ac160e3d achieved 11 maxattempts=10 it will stop retrying to reconnect  myapp-2
17:51:14,393 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222107: Cleared up resources for session cb4f9d0d-1c14-11ee-bfbb-0a58ac160e3d   myapp-2
17:51:14,390 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222107: Cleared up resources for session cb4c68bc-1c14-11ee-bfbb-0a58ac160e3d   myapp-2
17:51:14,390 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222061: Client connection failed, clearing up resources for session cb4f9d0d-1c14-11ee-bfbb-0a58ac160e3d    myapp-2
17:51:14,388 WARN  [org.apache.activemq.artemis.core.client] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ212037: Connection failure to /172.22.14.61:40688 has been detected: AMQ229014: Did not receive data from /172.22.14.61:40688 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] myapp-2
17:51:14,388 WARN  [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222061: Client connection failed, clearing up resources for session cb4c68bc-1c14-11ee-bfbb-0a58ac160e3d    myapp-2
17:50:21,620 WARN  [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-2
17:50:07,864 WARN  [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-0
17:49:49,217 WARN  [org.apache.activemq.artemis.core.server] (Thread-5 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-1
17:49:49,194 WARN  [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-0

After that, the logs are full of these messages, retrying connections to host identifiers (e.g. 172-22-8-61) that would seem to correspond to non-existent IP addresses:

18:58:47,982 WARN  [org.apache.activemq.artemis.core.server] (Thread-6 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@c93158f)) AMQ224091: Bridge ClusterConnectionBridge@407bbd0d [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f], temp=false]@101d7099 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@407bbd0d [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f], temp=false]@101d7099 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1659082611[nodeUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-14-63, address=jms, server=ActiveMQServerImpl::serverUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying    myapp-0
18:58:47,862 WARN  [org.apache.activemq.artemis.core.server] (Thread-5 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6a925b6c)) AMQ224091: Bridge ClusterConnectionBridge@555aeef4 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e], temp=false]@f5bed5e targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@555aeef4 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e], temp=false]@f5bed5e targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@871930775[nodeUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-10-62, address=jms, server=ActiveMQServerImpl::serverUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying  myapp-1
18:58:46,841 WARN  [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ224091: Bridge ClusterConnectionBridge@14708497 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e], temp=false]@1c2f52b7 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@14708497 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e], temp=false]@1c2f52b7 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@2015312406[nodeUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-62, address=jms, server=ActiveMQServerImpl::serverUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying   myapp-2

I found this Redhat guidance which seems to indicate an unclean shutdown of EAP/WildFly is the source. But our container traps SIGTERM and runs shutdown from the jboss-cli.sh using the pattern described here:

shutdown() {
    echo "Shutting down wildfly cleanly using cli..."
    ipv4=$(ifconfig eth0 | grep 'inet ' | awk '{print $2}')
    ${WFLY_DIR}/bin/jboss-cli.sh --connect --controller="${ipv4}:9990" --command=:shutdown
    exit
}
trap 'shutdown' SIGTERM
# get ip to configure jboss (jgroups does not like 0.0.0.0)
IP=$(ifconfig eth0 | grep 'inet ' | awk '{print $2}')
PARAMS=" -Djboss.bind.address=${IP} -Djboss.bind.address.public=${IP} -Djboss.bind.address.management=${IP} $@ "
# now delegate to wildfly
echo Wildfly started, use SIGTERM to shutdown ...
${WFLY_DIR}/bin/standalone.sh ${PARAMS} &
WFLYPID=$!
#Wait for shutdown
wait $WFLYPID

Log output ("Shutting down wildfly cleanly using cli...") would indicate that the trap is working, so WildFly should be shutting down gracefully. Furthermore, we set "reconnect-attempts" to 10, so I really don't understand why the logs are full of this message.

DEV-myapp@myapp-2:~/wildfly/standalone/configuration$grep reconnect my-standalone.xml
<cluster-connection name="my-cluster" address="jms" connector-name="http-connector" discovery-group="dg-group1" reconnect-attempts="10"  />
<connection-factory name="RemoteConnectionFactory" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="http-connector" ha="true" block-on-acknowledge="true" reconnect-attempts="20"/>

It looks to me like the host query parameter is being set to correspond to the IP of the container (host=172-22-8-61), but this IP changes with each redeployment of the application (stateful set).

Update 10. July

Now I am also seeing

07:38:32,300 ERROR [org.apache.activemq.artemis.core.client] (Thread-1 (ActiveMQ-client-netty-threads)) AMQ214016: Failed to create netty connection: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: No route to host
07:19:31,564 ERROR [org.apache.activemq.artemis.core.client] (Thread-16 (ActiveMQ-client-netty-threads)) AMQ214016: Failed to create netty connection: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: No route to host

leading me to this Redhat Knowledgebase article and making me wonder if this can work (stably) at all.

Can I resolve this situation? IP will always vary in the Kubernetes cluster, I read here that Kubernetes DNS can help with IP in WildFly clusters, but I don't understand how that would be the case, plus the normal JGroups cluster and Infinispan seems to be working fine. Can I configure something such that the host variable to use the (constant) hostname inside the pod instead of the IP? And why is the log still full of the messages when I added the reconnect-attempts attribute?

Update 17.07. currently just supresssing the logs with the following addition to the standalone.xml

 <!-- reduce logging for artemis due to AMQ224091 -->
            <logger category="org.apache.activemq.artemis.core.server">
                <level name="ERROR" />
            </logger>
sprockets
  • 981
  • 1
  • 6
  • 16
  • I should add, at least in my unit tests, the cluster appears to be working: messages sent from one node are consumed by my consumer on the other. – sprockets Jul 06 '23 at 19:03
  • How is `dg-group1` configured? – Justin Bertram Jul 10 '23 at 23:27
  • @JustinBertram I added the submodule configuration from the standalone.xml – sprockets Jul 11 '23 at 08:02
  • Since it does not seem to impact the processing in JMS, I have simply supressed the log messages in the standalone.xml – sprockets Jul 17 '23 at 17:57
  • How is the `activemq-cluster` configured that `dg-group1` is pointing to? – Justin Bertram Jul 17 '23 at 18:02
  • @JustinBertram I have updated to include the entire `standalone.xml` configuration. we are using a somewhat reduced standalone-ha configuration with tcp jgroups in k8s (for a statefulset) and a headless service to do DNS_PING (https://medium.com/@nishada/keycloak-clustering-on-kubernetes-ec3d6a99fc33). other than that, I guess the `activemq-cluster` is using wildfly defaults (although I do not know what they are?) – sprockets Jul 19 '23 at 07:36
  • I don't see the updated configuration. – Justin Bertram Jul 19 '23 at 13:49
  • @JustinBertram I thought I was going crazy, I am sure I added it, then I did it again and noticed the overly subtle message: Body is limited to 30000 characters; you entered 49501. I have added it here: https://gist.github.com/turchinc/0d494d3ea7a02a7cf57acf632dc60679 – sprockets Jul 19 '23 at 18:17

0 Answers0