0

I am starting to study how can I implement an application supporting Failover/FaultTolerance on top of JMS, more precisely EMS

I configured two EMS servers working both with FaultTolerance enabled:

For EMS running on server on server1 I have

  • in tibemsd.conf

      ft_active = tcp://server2:7232
    
  • in factories.conf

    [GenericConnectionFactory]
      type                  = generic
      url                   = tcp://server1:7232
    
    [FTTopicConnectionFactory]
      type                  = topic
      url                   = tcp://server1:7232,tcp://server2:7232
    
    [FTQueueConnectionFactory]
      type                  = queue
      url                   = tcp://server1:7232,tcp://server2:7232
    

And for EMS running on server on server2 I have

  • in tibemsd.conf

      ft_active = tcp://server1:7232
    
  • in factories.conf

    [GenericConnectionFactory]
      type                  = generic
      url                   = tcp://server2:7232
    
    [FTTopicConnectionFactory]
      type                  = topic
      url                   = tcp://server2:7232,tcp://server1:7232
    
    [FTQueueConnectionFactory]
      type                  = queue
      url                   = tcp://server2:7232,tcp://server1:7232
    

I am not a TIBCO EMS expert but my config seems to be good: When I start EMS on server1 I get:

  $ tibemsd -config tibemsd.conf
  ...
  2022-07-20 23:04:58.566 Server is active.
  2022-07-20 23:05:18.563 Standby server 'SERVERNAME@server1' has connected.

then if I start EMS on server2, I get

  $ tibemsd -config tibemsd.conf
  ...
  2022-07-20 23:05:18.564 Accepting connections on tcp://server2:7232.
  2022-07-20 23:05:18.564 Server is in standby state for 'tcp://server1:7232'

Moreover, if I kill active EMS on server1, I immediately get the following message on server2:

  2022-07-20 23:21:52.891 Connection to active server 'tcp://server1:7232' has been lost.
  2022-07-20 23:21:52.891 Server activating on failure of 'tcp://server1:7232'.
  ...
  2022-07-20 23:21:52.924 Server is now active.

Until here, everything looks OK, active/standby EMS servers seems to be correctly configured

Things get more complicated when I write a piece of code how is supposed to connect to these EMS servers and to periodically publish messages. Let's try with the following code sample:

    @Test
    public void testEmsFailover() throws JMSException, InterruptedException {
        int NB = 1000;

        TibjmsConnectionFactory factory = new TibjmsConnectionFactory();
        factory.setServerUrl("tcp://server1:7232,tcp://server2:7232");
        Connection connection = factory.createConnection();
        Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
        connection.start();

        for (int i = 0; i < NB; i++) {
            LOG.info("sending message");
            Queue queue = session.createQueue(QUEUE__CLIENT_TO_FRONTDOOR__CONNECTION_REQUEST);
            MessageProducer producer = session.createProducer(queue);
            MapMessage mapMessage = session.createMapMessage();
            mapMessage.setStringProperty(PROPERTY__CLIENT_KIND, USER.toString());
            mapMessage.setStringProperty(PROPERTY__CLIENT_NAME, "name");
            producer.send(mapMessage);
            LOG.info("done!");

            Thread.sleep(1000);
        }

    }

If I run this code while both active and standby servers are up, everything looks good

  23:26:32.431 [main] INFO JmsEndpointTest - sending message
  23:26:32.458 [main] INFO JmsEndpointTest - done!
  23:26:33.458 [main] INFO JmsEndpointTest - sending message
  23:26:33.482 [main] INFO JmsEndpointTest - done!

Now If I kill the active EMS server, I would expect that

  • the standby server would instantaneously become the active one
  • my code would continue to publish such as if nothing had happened

However, in my code I get the following error:

javax.jms.JMSException: Connection is closed

    at com.tibco.tibjms.TibjmsxLink.sendRequest(TibjmsxLink.java:307)
    at com.tibco.tibjms.TibjmsxLink.sendRequestMsg(TibjmsxLink.java:261)
    at com.tibco.tibjms.TibjmsxSessionImp._createProducer(TibjmsxSessionImp.java:1004)
    at com.tibco.tibjms.TibjmsxSessionImp.createProducer(TibjmsxSessionImp.java:4854)
    at JmsEndpointTest.testEmsFailover(JmsEndpointTest.java:103)
...

and in the logs of the server (the previous standby server supposed to be now the active one) I get

2022-07-20 23:32:44.447 [anonymous@cersei]: connect failed: server not in active state
2022-07-20 23:33:02.969 Connection to active server 'tcp://server2:7232' has been lost.
2022-07-20 23:33:02.969 Server activating on failure of 'tcp://server2:7232'.
2022-07-20 23:33:02.969 Server rereading configuration.
2022-07-20 23:33:02.971 Recovering state, please wait.
2022-07-20 23:33:02.980 Recovered 46 messages.
2022-07-20 23:33:02.980 Server is now active.
2022-07-20 23:33:03.545 [anonymous@cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.187 [anonymous@cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:04.855 [anonymous@cersei]: reconnect failed: connection unknown for id=8
2022-07-20 23:33:05.531 [anonymous@cersei]: reconnect failed: connection unknown for id=8

I would appreciate any help to enhance my code

Thank you

Philippe MESMEUR
  • 737
  • 8
  • 22

1 Answers1

0

I think I found the origin of my problem:

according to the page Tibco-Ems Failover Issue, the error message

  reconnect failed: connection unknown for id=8

means: "the store (ems db) was'nt share between the active and the standby node, so when the active ems failed, the new active ems was'nt able to recover connections and messages."

I realized that it is painful to configure a shared store. To avoid it, I configured two tibems on the same host, by following the page Step By Step How to Setup TIBCO EMS In Fault Tolerant Mode:

  • two tibemsd.conf configuration files
  • configure a different listen port in each file
  • configure ft_active with url of other server
  • configure factories.conf

By doing so, I can replay my test and it works as expected

General Grievance
  • 4,555
  • 31
  • 31
  • 45
Philippe MESMEUR
  • 737
  • 8
  • 22
  • What you did is valid, in real life the EMS datastore files and configuration files (except tibemsd.conf file) should be on a shared file system and used by both EMS instances. – EmmanuelM Jul 21 '22 at 17:47