1

In my processor API I store the messages in a key value store and every 100 messages I make a POST request. If something fails while trying to send the messages (api is not responding etc.) I want to stop processing messages. Until there is evidence the API calls work. Here is my code:

public class BulkProcessor implements Processor<byte[], UserEvent> {

    private KeyValueStore<Integer, ArrayList<UserEvent>> keyValueStore;

    private BulkAPIClient bulkClient;

    private String storeName;

    private ProcessorContext context;

    private int count;

    @Autowired
    public BulkProcessor(String storeName, BulkClient bulkClient) {
        this.storeName = storeName;
        this.bulkClient = bulkClient;
    }

    @Override
    public void init(ProcessorContext context) {
        this.context = context;
        keyValueStore = (KeyValueStore<Integer, ArrayList<UserEvent>>) context.getStateStore(storeName);
        count = 0;
        // to check every 15 minutes if there are any remainders in the store that are not sent yet
        this.context.schedule(Duration.ofMinutes(15), PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
            if (count > 0) {
                sendEntriesFromStore();
            }
        });
    }

    @Override
    public void process(byte[] key, UserEvent value) {
        int userGroupId = Integer.valueOf(value.getUserGroupId());
        ArrayList<UserEvent> userEventArrayList = keyValueStore.get(userGroupId);
        if (userEventArrayList == null) {
            userEventArrayList = new ArrayList<>();
        }
        userEventArrayList.add(value);
        keyValueStore.put(userGroupId, userEventArrayList);
        if (count == 100) {
            sendEntriesFromStore();
        }
    }

    private void sendEntriesFromStore() {
        KeyValueIterator<Integer, ArrayList<UserEvent>> iterator = keyValueStore.all();
        while (iterator.hasNext()) {
            KeyValue<Integer, ArrayList<UserEvent>> entry = iterator.next();
            BulkRequest bulkRequest = new BulkRequest(entry.key, entry.value);
            if (bulkRequest.getLocation() != null) {
                URI url = bulkClient.buildURIPath(bulkRequest);
                try {
                    bulkClient.postRequestBulkApi(url, bulkRequest);
                    keyValueStore.delete(entry.key);
                } catch (BulkApiException e) {
                    logger.warn(e.getMessage(), e.fillInStackTrace());
                }
            }
        }
        iterator.close();
        count = 0;
    }

    @Override
    public void close() {
    }
}

Currently in my code if a call to the API fails it will iterate the next 100 (and this will keep happening as long as it fails) and add them to the keyValueStore. I don't want this to happen. Instead I would prefer to stop the stream and continue once the keyValueStore is emptied. Is that possible?
Could I throw a StreamsException?

try {
    bulkClient.postRequestBulkApi(url, bulkRequest);
    keyValueStore.delete(entry.key);
} catch (BulkApiException e) {
    throw new StreamsException(e);
}

Would that kill my stream app and so the process dies?

Alex P.
  • 3,073
  • 3
  • 22
  • 33
  • Did your count increase after receiving new message? – Tuyen Luong Apr 08 '20 at 12:34
  • 1
    @TuyenLuong well the `count` doesn't increase because in the `sendEntriesFromStore` method I'm resetting it to `0`. I just don't want +100 events to be added to the `keyValueStore` if something in my `POST` request doesn't work – Alex P. Apr 08 '20 at 12:38

2 Answers2

0
  1. You should only delete the record from state store after you make sure your record is successfully processed by the API, so remove the first keyValueStore.delete(entry.key); and keep the second one. If not then you can potentially lost some messages when keyValueStore.delete is committed to underlying changelog topic but your messages are not successfully process yet, so it's only at most one guarantee.
  2. Just wrap the calling API code around an infinite loop and keep trying until the record successfully processed, your processor will not consume new message from above processor node cause it's running in a same StreamThread:
    private void sendEntriesFromStore() {
        KeyValueIterator<Integer, ArrayList<UserEvent>> iterator = keyValueStore.all();
        while (iterator.hasNext()) {
            KeyValue<Integer, ArrayList<UserEvent>> entry = iterator.next();
            //remove this state store delete code : keyValueStore.delete(entry.key);
            BulkRequest bulkRequest = new BulkRequest(entry.key, entry.value);
            if (bulkRequest.getLocation() != null) {
                URI url = bulkClient.buildURIPath(bulkRequest);
                while (true) {
                    try {
                        bulkClient.postRequestBulkApi(url, bulkRequest);
                        keyValueStore.delete(entry.key);//only delete after successfully process the message to achieve at least one processing guarantee
                        break;
                    } catch (BulkApiException e) {
                        logger.warn(e.getMessage(), e.fillInStackTrace());
                    }
                }
            }
        }
        iterator.close();
        count = 0;
    }
  1. Yes you could throw a StreamsException, this StreamTask will be migrate to another StreamThread during re-balance, maybe on the sample application instance. If the API keep causing Exception until all StreamThread had died, your application will not automatically exit and receive below Exception, you should add a custom StreamsException handler to exit your app when all stream threads had died using KafkaStreams#setUncaughtExceptionHandler or listen to Stream State change (to ERROR state):
All stream threads have died. The instance will be in error state and should be closed.
Tuyen Luong
  • 1,316
  • 8
  • 17
  • yeah sorry the first delete was a mistake from copy paste, from an older state of code, mixed with the new one. The first delete is not actually in the code anymore, that's why I added it in the try catch. As for the while loop, I was more hoping to switch it into a dead state and the restart. – Alex P. Apr 08 '20 at 12:51
  • 2
    Just retrying in an infinite loop would "halt" processing, however, your thread would eventually drop out of the consumer group as `poll()` won't be called. You could increase `max.poll.interval.ms` but setting it larger has other (possibly undesired) side effect. In the end, there is no good support to "halt" processing in Kafka Streams atm. -- It's coming up on a more or less regular basis. Thus, I think we should add built-in support into Kafka Streams to support this. – Matthias J. Sax Apr 09 '20 at 03:37
0

In the end I used a simple KafkaConsumer instead of KafkaStreams, but the bottom line was that I changed the BulkApiException to extend RuntimeException, which I throw again after I log it. So now it looks as follows:

        } catch (BulkApiException bae) {
            logger.error(bae.getMessage(), bae.fillInStackTrace());
            throw new BulkApiException();
        } finally {
            consumer.close();
            int exitCode = SpringApplication.exit(ctx, () -> 1);
            System.exit(exitCode);
        }

This way the application is exited and the k8s restarts the pod. That was because if the api where I'm trying to forward the requests is down, then there is no point on continue reading messages. So until the other api is back up k8s will restart a pod.

Alex P.
  • 3,073
  • 3
  • 22
  • 33