Apache ActiveMQ Artemis cluster in Kubernetes environment with the Istio proxy injected into the Artemis pods logs lot of errors showing connection loss between master(active) and slave (backup) instances. We also experience intermittent connection loss from JBoss EAP to Artemis which gets restored on the subsequent call (e.g., when sending a JMS message).
The tcpdump for the active (master) instance shows lots of RST for the connections between backup and active artemis instances (comparing to tcpdump for environment where Artemis pods do not have Istio proxy injected).
The allowed/used protocols for Artemis acceptor are CORE, AMQP. The default communication ports are preserved for the Artemis cluster.
The Artemis cluster is used for JMS messaging. JMS communication is performed from Jboss EAP that "sits" in another pod in the same kubernetes namespace.
The static connectors are used to form the Artemis cluster. The replication mechanism is used for data exchange. The static connectors are used in the Jboss configuration as well.
The TCP and HTTP connections idleTimeout was set to infinite (for both INBOUND and OUTBOUND) in the Istio proxy.
When Istio proxy is not injected, the Artemis logs do not show any error and no issues with JMS messaging observed in this case.
NOTE: There is no ping command available on the container where artemis/jboss are installed (in case it matters for performing "alive checks")
The stack trace from the active (master) Artemis pod (multiple snippets):
2023-04-10 20:56:19,436 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 has been detected: AMQ219011: Did not receive data from server for org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@6243c958[ID=fef52fc9, local= /<***artemis active instance's IP***>:55842, remote=<***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616] [code=CONNECTION_TIMEDOUT]
2023-04-10 20:56:19,562 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection
java.lang.IllegalStateException: No ActiveMQChannelHandler has been found while connecting to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 from Channel with id = cf68f05a
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:954) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:840) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:822) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.openTransportConnection(ClientSessionFactoryImpl.java:1105) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1212) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1146) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.establishNewConnection(ClientSessionFactoryImpl.java:1375) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(ClientSessionFactoryImpl.java:967) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:858) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:799) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:656) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:534) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:527) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$1.run(ClientSessionFactoryImpl.java:390) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:58) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:33) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:69) ~[artemis-commons-2.27.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.27.0.jar:?]
2023-04-11 01:22:33,180 WARN [org.apache.activemq.artemis.core.server] AMQ222092: Connection to the backup node failed, removing replication now
org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ229014: Did not receive data from /127.0.0.6:55653 within the 60000ms connection TTL. The connection will now be closed.
at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread$2.run(RemotingServiceImpl.java:781) ~[artemis-server-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:58) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:33) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:69) ~[artemis-commons-2.27.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.27.0.jar:?]
2023-04-11 01:22:33,405 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection
java.lang.IllegalStateException: No ActiveMQChannelHandler has been found while connecting to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 from Channel with id = f979331a
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:954) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:840) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:822) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.openTransportConnection(ClientSessionFactoryImpl.java:1105) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1212) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1146) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.establishNewConnection(ClientSessionFactoryImpl.java:1375) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(ClientSessionFactoryImpl.java:967) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:858) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:799) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:656) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:534) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:527) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$1.run(ClientSessionFactoryImpl.java:390) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:58) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:33) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:69) ~[artemis-commons-2.27.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.27.0.jar:?]
The stack trace from the Jboss pod:
2023-04-10 20:56:10,649 AMQ214016: Failed to create netty connection: java.lang.IllegalStateException: No ActiveMQChannelHandler has been found while connecting to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 from Channel with id = 98ac7598
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:970)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:856)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:838)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.openTransportConnection(ClientSessionFactoryImpl.java:1097)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1146)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.establishNewConnection(ClientSessionFactoryImpl.java:1378)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(ClientSessionFactoryImpl.java:952)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:841)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:779)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:638)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:525)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:518)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$100(ClientSessionFactoryImpl.java:74)
at org.apache.activemq.artemis@2.16.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$1.run(ClientSessionFactoryImpl.java:381)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Actions performed: We tried to set idleTimeout to infinite for the INBOUND and OUTBOUND connections for artemis and jboss Istio proxies using the EnvoyFilter configuration (the Envoy Filter is a configuration resource for the Istio proxy - helps to configure inbound/outbound traffic). We also set reconnect-attempts to "-1" and connection-ttl to "86400000" for the pooled-connection-factory in jboss (standallone-full.xml file).
We were expecting to keep the connections to Artemis alive at least for 24 hrs, but it did not happen.
I am wondering what the possible root cause is and what additional configuration needs to be applied to Artemis and Jboss installation to keep the connections alive. Is this issue related to the "keep alive" checks?