2

I'm a maintainer of the JavaLite open source project. One part of it is called Async, and it is a simplified front-end for the Apache ActiveMQ Artemis broker. It exists in order to make it easier to embed Artemis in memory of the process and also adds a layer of convenience to process "Commands". We have used it in production for many years with almost no issues.

However, the JavaLite project itself has tests that start/stop the broker multiple times and use different instances for different tests. Here's the source code of Async and the source code of tests.

As you can see, the test creates a new instance of the broker, uses it, then stops it.

Here's the start and stop methods.

Now to the question. Generally the build running tests succeeds without issues on both my laptop and an older CI environment. My laptop and previous CI servers never had any issues with it. However, we are building a new CI server, and this test fails there with random number of logical errors (test conditions). Sometimes it succeeds too. Everything is identical between machines where it succeeds and fails, except hardware. The box where it fails has only two CPU cores (my laptop has 12 cores).

So, in the test AsyncSpec, we create, start and stop the broker and it seems that some data is randomly bleeding across different instances of the broker.

What is the best/cleanest way to create/start/stop/destroy the Artemis embedded server in the same VM without conflicts across multiple instances?

Justin Bertram
  • 29,372
  • 4
  • 21
  • 43
ipolevoy
  • 5,432
  • 2
  • 31
  • 46
  • @justin-bertram, just subscribed to the mailing list, lets take the conversation there. – ipolevoy Nov 25 '20 at 18:35
  • @justin-bertram, sorry to ask this, but I have not used mailing boards much, how do I ask a question, shall I just send email to users at activemq.apache.org? – ipolevoy Nov 25 '20 at 18:40
  • @justin-bertram, I sent email to users at activemq.apache.org and added more context there. Appreciate any help! – ipolevoy Nov 30 '20 at 22:05
  • @justin-bertram, the advice on that email list did help, I resolved the issues and have a reproducible successful test now, appreciate your help! – ipolevoy Dec 03 '20 at 20:12

1 Answers1

2

I loaded JavaLite into my IDE and reproduced some failures for the AsyncSpec test you linked. Here are my observations...

One thing that I noticed is that it's possible for some of your tests to leak brokers. Any test that has an assertion before the broker is stopped can leak if that assertion fails because the test will terminate without stopping the broker. This will negatively impact any tests that follow. You should stop your brokers in a finally block or perhaps in the @After method. In any case, you need to make absolutely sure that no matter what happens in the test the broker is stopped.

Your tests also leak journals. You create the embedded broker's journal directory using this:

Files.createTempDirectory("async").toFile().getCanonicalPath();

However, you never clean that directory up. I ran thousands of test iterations in a loop and it filled up over 200GB of disk space.

I think the most important thing is that when you send messages in your tests you send them as non-persistent which means they will be sent asynchronously. However, your tests don't take this into account which can lead to race conditions and ultimately assertion/test failures. I see a couple of options to resolve this. You could send messages as persistent which will be done synchronously. You could set blockOnNonDurableSend=true on your embedded client's URL (discussed in the documentation) which will also make sending the messages synchronous. Or you could add some other kind of Wait.waitFor() to ensure a meaningful condition is met before proceeding with the test (although you won't be able to inspect the message-count in most tests since you're using asynchronous message listeners).

I also think your tests would be more robust if you used Wait.waitFor() when inspecting HelloCommand.counter() and if you did this before you shut down the broker. Given the asynchronous nature of the message listeners there may be a discrepancy between the counter's value and the queue's message count for a short time. Furthermore, leaving the broker up during the Wait.waitFor() will mean that the message listeners don't get prematurely stopped.

When I first started running the tests I could consistently reproduce a failure in less than 25 runs or so. With the changes I listed above I was able to run over 12,000 times without a failure.

Ultimately I don't see any issues with the broker at this point, just issues with the tests.

Justin Bertram
  • 29,372
  • 4
  • 21
  • 43
  • that directory is cleaned up. The I shared link to this code by accident. The code in question is on the java8 branch. Nonetheless, you pointed me in the right direction, and I eliminated all potential race conditions, which did fix the issue. – ipolevoy Dec 03 '20 at 21:55