As discussed in this question, I setup a cluster with embedded ActiveMQ on WildFly 25 in Kubernetes. The configuration for the ActiveMQ subsystem is as follows:
<subsystem xmlns="urn:jboss:domain:messaging-activemq:13.0">
<server name="default">
<security elytron-domain="ApplicationDomain"/>
<cluster password="${jboss.messaging.cluster.password:xxxxxxx}"/>
<statistics enabled="${wildfly.messaging-activemq.statistics-enabled:${wildfly.statistics-enabled:false}}"/>
<security-setting name="#">
<role name="guest" send="true" consume="true" create-non-durable-queue="true" delete-non-durable-queue="true"/>
</security-setting>
<address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760"
page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000"/>
<http-connector name="http-connector" socket-binding="http" endpoint="http-acceptor"/>
<http-connector name="http-connector-throughput" socket-binding="http" endpoint="http-acceptor-throughput">
<param name="batch-delay" value="50"/>
</http-connector>
<in-vm-connector name="in-vm" server-id="0">
<param name="buffer-pooling" value="false"/>
</in-vm-connector>
<http-acceptor name="http-acceptor" http-listener="default"/>
<http-acceptor name="http-acceptor-throughput" http-listener="default">
<param name="batch-delay" value="50"/>
<param name="direct-deliver" value="false"/>
</http-acceptor>
<in-vm-acceptor name="in-vm" server-id="0">
<param name="buffer-pooling" value="false"/>
</in-vm-acceptor>
<jgroups-broadcast-group name="bg-group1" jgroups-cluster="activemq-cluster" connectors="http-connector"/>
<jgroups-discovery-group name="dg-group1" jgroups-cluster="activemq-cluster"/>
<cluster-connection name="my-cluster" address="jms" connector-name="http-connector" discovery-group="dg-group1" reconnect-attempts="10" />
<jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
<jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
<!-- our queues below here -->
<connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
<connection-factory name="RemoteConnectionFactory" entries="java:jboss/exported/jms/RemoteConnectionFactory"
connectors="http-connector" ha="true" block-on-acknowledge="true" reconnect-attempts="10"/>
<pooled-connection-factory name="activemq-ra" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory"
connectors="in-vm" transaction="xa"/>
</server>
</subsystem>
It seems to startup fine, when I query the pods for IP and cluster members with this simple PowerShell script:
$x = 0
$max =3
do {
$pod = "pod/myapp-"
$pod += $x
Write-Host $pod
oc exec -it $pod -c myapp -- bash -c "ip addr show dev eth0 | grep global"
oc exec -it $pod -c myapp -- bash -c "/myapp/wildfly/bin/jboss-cli.sh --connect --command='/subsystem=messaging-activemq/server=default/cluster-connection=my-cluster:get-nodes()' "
$x++
} while ($x -lt $max)
I see the following output, which is the name of the pod/container, the IP and the members of the cluster :
pod/myapp-0
inet 172.22.14.63/24 brd 172.22.14.255 scope global eth0
{
"outcome" => "success",
"result" => {
"138ef848-1c1d-11ee-b9ab-0a58ac160a3e" => "172.22.10.62/172.22.10.62:8080",
"0a15e3c0-1c1d-11ee-9e42-0a58ac16083e" => "172.22.8.62/172.22.8.62:8080"
}
}
pod/myapp-1
inet 172.22.10.62/24 brd 172.22.10.255 scope global eth0
{
"outcome" => "success",
"result" => {
"1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f" => "172.22.14.63/172.22.14.63:8080",
"0a15e3c0-1c1d-11ee-9e42-0a58ac16083e" => "172.22.8.62/172.22.8.62:8080"
}
}
pod/myapp-2
inet 172.22.8.62/24 brd 172.22.8.255 scope global eth0
{
"outcome" => "success",
"result" => {
"1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f" => "172.22.14.63/172.22.14.63:8080",
"138ef848-1c1d-11ee-b9ab-0a58ac160a3e" => "172.22.10.62/172.22.10.62:8080"
}
}
But after the first restart of the cluster, I see the following messages appear once:
18:50:53,752 WARN [org.apache.activemq.artemis.core.server] (Thread-13 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222101: Bridge $.artemis.internal.sf.my-cluster.dcf19ee6-1c14-11ee-b0c7-0a58ac160e3e achieved 11 maxattempts=10 it will stop retrying to reconnect myapp-2
18:50:17,366 WARN [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222107: Cleared up resources for session 0abdbca7-1c1d-11ee-b0c7-0a58ac160e3e myapp-2
18:50:17,363 WARN [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222107: Cleared up resources for session 0ab88c85-1c1d-11ee-b0c7-0a58ac160e3e myapp-2
18:50:17,363 WARN [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222061: Client connection failed, clearing up resources for session 0abdbca7-1c1d-11ee-b0c7-0a58ac160e3e myapp-2
18:50:17,361 WARN [org.apache.activemq.artemis.core.client] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ212037: Connection failure to /172.22.14.62:39870 has been detected: AMQ229014: Did not receive data from /172.22.14.62:39870 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] myapp-2
18:50:17,362 WARN [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ222061: Client connection failed, clearing up resources for session 0ab88c85-1c1d-11ee-b0c7-0a58ac160e3e myapp-2
18:49:23,579 WARN [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-2
17:51:27,544 WARN [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222101: Bridge $.artemis.internal.sf.my-cluster.31762214-1bff-11ee-bfbb-0a58ac160e3d achieved 11 maxattempts=10 it will stop retrying to reconnect myapp-2
17:51:14,393 WARN [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222107: Cleared up resources for session cb4f9d0d-1c14-11ee-bfbb-0a58ac160e3d myapp-2
17:51:14,390 WARN [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222107: Cleared up resources for session cb4c68bc-1c14-11ee-bfbb-0a58ac160e3d myapp-2
17:51:14,390 WARN [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222061: Client connection failed, clearing up resources for session cb4f9d0d-1c14-11ee-bfbb-0a58ac160e3d myapp-2
17:51:14,388 WARN [org.apache.activemq.artemis.core.client] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ212037: Connection failure to /172.22.14.61:40688 has been detected: AMQ229014: Did not receive data from /172.22.14.61:40688 within the 60,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] myapp-2
17:51:14,388 WARN [org.apache.activemq.artemis.core.server] (Thread-2 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6f7945b6)) AMQ222061: Client connection failed, clearing up resources for session cb4c68bc-1c14-11ee-bfbb-0a58ac160e3d myapp-2
17:50:21,620 WARN [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-2
17:50:07,864 WARN [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-0
17:49:49,217 WARN [org.apache.activemq.artemis.core.server] (Thread-5 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-1
17:49:49,194 WARN [org.apache.activemq.artemis.core.server] (Thread-4 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false myapp-0
After that, the logs are full of these messages, retrying connections to host identifiers (e.g. 172-22-8-61) that would seem to correspond to non-existent IP addresses:
18:58:47,982 WARN [org.apache.activemq.artemis.core.server] (Thread-6 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@c93158f)) AMQ224091: Bridge ClusterConnectionBridge@407bbd0d [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f], temp=false]@101d7099 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@407bbd0d [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f], temp=false]@101d7099 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1659082611[nodeUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-14-63, address=jms, server=ActiveMQServerImpl::serverUUID=1c1f339a-1c1d-11ee-8e9e-0a58ac160e3f])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying myapp-0
18:58:47,862 WARN [org.apache.activemq.artemis.core.server] (Thread-5 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@6a925b6c)) AMQ224091: Bridge ClusterConnectionBridge@555aeef4 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e], temp=false]@f5bed5e targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@555aeef4 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e], temp=false]@f5bed5e targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@871930775[nodeUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-10-62, address=jms, server=ActiveMQServerImpl::serverUUID=138ef848-1c1d-11ee-b9ab-0a58ac160a3e])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying myapp-1
18:58:46,841 WARN [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@2fff22c0)) AMQ224091: Bridge ClusterConnectionBridge@14708497 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e], temp=false]@1c2f52b7 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@14708497 [name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.ca6febdd-1c14-11ee-9dbc-0a58ac16083d, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e], temp=false]@1c2f52b7 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@2015312406[nodeUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-62, address=jms, server=ActiveMQServerImpl::serverUUID=0a15e3c0-1c1d-11ee-9e42-0a58ac16083e])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=172-22-8-61], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying myapp-2
I found this Redhat guidance which seems to indicate an unclean shutdown of EAP/WildFly is the source. But our container traps SIGTERM and runs shutdown from the jboss-cli.sh
using the pattern described here:
shutdown() {
echo "Shutting down wildfly cleanly using cli..."
ipv4=$(ifconfig eth0 | grep 'inet ' | awk '{print $2}')
${WFLY_DIR}/bin/jboss-cli.sh --connect --controller="${ipv4}:9990" --command=:shutdown
exit
}
trap 'shutdown' SIGTERM
# get ip to configure jboss (jgroups does not like 0.0.0.0)
IP=$(ifconfig eth0 | grep 'inet ' | awk '{print $2}')
PARAMS=" -Djboss.bind.address=${IP} -Djboss.bind.address.public=${IP} -Djboss.bind.address.management=${IP} $@ "
# now delegate to wildfly
echo Wildfly started, use SIGTERM to shutdown ...
${WFLY_DIR}/bin/standalone.sh ${PARAMS} &
WFLYPID=$!
#Wait for shutdown
wait $WFLYPID
Log output ("Shutting down wildfly cleanly using cli...") would indicate that the trap is working, so WildFly should be shutting down gracefully. Furthermore, we set "reconnect-attempts" to 10, so I really don't understand why the logs are full of this message.
DEV-myapp@myapp-2:~/wildfly/standalone/configuration$grep reconnect my-standalone.xml
<cluster-connection name="my-cluster" address="jms" connector-name="http-connector" discovery-group="dg-group1" reconnect-attempts="10" />
<connection-factory name="RemoteConnectionFactory" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="http-connector" ha="true" block-on-acknowledge="true" reconnect-attempts="20"/>
It looks to me like the host query parameter is being set to correspond to the IP of the container (host=172-22-8-61
), but this IP changes with each redeployment of the application (stateful set).
Update 10. July
Now I am also seeing
07:38:32,300 ERROR [org.apache.activemq.artemis.core.client] (Thread-1 (ActiveMQ-client-netty-threads)) AMQ214016: Failed to create netty connection: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: No route to host
07:19:31,564 ERROR [org.apache.activemq.artemis.core.client] (Thread-16 (ActiveMQ-client-netty-threads)) AMQ214016: Failed to create netty connection: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: No route to host
leading me to this Redhat Knowledgebase article and making me wonder if this can work (stably) at all.
Can I resolve this situation? IP will always vary in the Kubernetes cluster, I read here that Kubernetes DNS can help with IP in WildFly clusters, but I don't understand how that would be the case, plus the normal JGroups cluster and Infinispan seems to be working fine. Can I configure something such that the host
variable to use the (constant) hostname inside the pod instead of the IP? And why is the log still full of the messages when I added the reconnect-attempts
attribute?
Update 17.07. currently just supresssing the logs with the following addition to the standalone.xml
<!-- reduce logging for artemis due to AMQ224091 -->
<logger category="org.apache.activemq.artemis.core.server">
<level name="ERROR" />
</logger>