0

I have singleton client with the below contract

public interface MQPublisher {
    void publish(String message) throws ClientConnectionException, ClientErrorException;

    void start() throws ClientException;

    void stop();
}

The class which is using this publisher is as below :

public class MessagePublisher {
    @Autowired
    private MQPublisher publisher;
    private AtomicBoolean isPublisherRunning;

    public void startPublisher() {
        if (!isPublisherRunning.get()) {
            publisher.start();
            isPublisherRunning.compareAndSet(false, true);
        }
    }

    @Retry(RETRY_MSG_UPLOAD)
    public void sendMessage(String msg) {
        try {
            startPublisher();
            publisher.publish(msg); // when multiple requests fail with the same exception, what will happen??
        } catch (Exception e) {
            log.error("Exception while publishing message : {}", msg, e);
            publisher.stop();
            isPublisherRunning.compareAndSet(true, false);
            throw e;
        }
    }

We are using resilience4j retry functionality to retry the sendMessage method. This works fine in case of a single request. Consider a case when multiple requests are processed parallely and all of them fails with an exception. In this case, these requests will be retried and there is a chance that one thread will start the publisher while the other will stop it and it will throw exceptions again. How to handle this scenario in a cleaner way?

Kaushik Das
  • 99
  • 1
  • 1
  • 8
  • Please explain why this is a problem: `one thread will start the publisher while the other will stop it and it will throw exceptions again`. To me, it sounds like it works as the design. – yoni May 30 '22 at 09:46
  • Consider a case when 10 threads are processing requests parallely, and all of them are at this line -> publisher.publish(msg). Now say because of network or other issue, they fail and all of them throws exceptions. And say on first retry, the operation should succeed as we are stopping the publisher and starting again. But this work of stopping and starting will be done by all the 10 individual threads and there is a chance that a thread t1 stopped the publisher and started it, but before the message is published, thread t2 stops the publisher, then t1 will throw an exception again. – Kaushik Das May 30 '22 at 10:38
  • That shouldn't happen as the publisher is stopped and restarted and any transient errors should be resolved by then. I know this design is wrong, my question is how can we handle it in a cleaner way so we don't run in a situation where the publisher is stopped and started again and again and it keeps on throwing exceptions – Kaushik Das May 30 '22 at 10:48

1 Answers1

0

It isn't clear why the whole publisher should be stopped in case of failure. Nevertheless, if there are real reasons for that, I would change the stop method to use an atomic timer that will restart on each message sending and stop the publisher only after at least 5 seconds (or the time needed for a message to be successfully sent) have passed from the message sending. Something like that:

@Slf4j
public class MessagePublisher {
    private static final int RETRY_MSG_UPLOAD = 10;
    @Autowired
    private MQPublisher publisher;
    private AtomicBoolean isPublisherRunning;
    private AtomicLong publishStart;


    public void startPublisher() {
       if (!isPublisherRunning.get()) {
           publisher.start();
           isPublisherRunning.compareAndSet(false, true);
       }
    }

    @Retryable(maxAttempts = RETRY_MSG_UPLOAD)
    public void sendMessage(String msg) throws InterruptedException {
        try {
            startPublisher();
            publishStart.set(System.nanoTime());
            publisher.publish(msg); // when multiple requests fail with the same exception, what will happen??
        } catch (Exception e) {
            log.error("Exception while publishing message : {}", msg, e);
            while (System.nanoTime() < publishStart.get() + 5000000000L) {
                Thread.sleep(1000);
            }
            publisher.stop();
            isPublisherRunning.compareAndSet(true, false);
            throw e;
        }
    }
}

I think it is important to mention (as you just did) that this is a terrible design, and that such calculations should be done by the publisher implementer and not by the caller.

yoni
  • 1,164
  • 2
  • 13
  • 29