1

I would like to know whether trident batches are executed in parallel i.e. multiple batches can run at a time?

Apart from this I have few questions which are too small to be posted individually. If they are quite large enough, feel free to comment to post them individually.

  1. What if processing only a particular tuple in a batch is failed?

    Then the batch will be replayed, resulting in the reprocessing of tuples that are successfully processed previously? For example, word count, in which every tuple contains a word but only a few tuples were successfully counted? For example, if there are three words called man and the count shows only 2 which means that one tuple is failed in processing?

  2. In this tutorial only, previous txid is stored. What about the previous transaction ids?

    For example, there are three batches 1,2,3,4. Now, after batch #1, #2 are executed and batch #1 is replayed. Then txid will be 2 since the most recently processed batch is batch #2 and there is no way to recognize that whether batch #1 is previously processed or not. If so, then the batches must be executed in order. That means until the batch #1 is successfully finished batch #2 cannot be executed. If it is the case, then where is the parallelism in executing the batches?

  3. What if only a particular function is not executed properly for a batch in a topology?

    For example, I have two functions, one is to persist the message into database and the other is to produce to kafka queue. And here, persisting in the database is successful however pushing to the kafka queue is failed due to some node failures (say for example). Then, I would want only the function that pushes to the kafka queue to be executed for that particular batch. Is there a way to do in trident? For this, I will need to store not only the txid but also a list of functions that are to be processed for that txid. How could it be done?

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
JavaTechnical
  • 8,846
  • 8
  • 61
  • 97

1 Answers1

0

As best I understand:

  1. Any failure is considered failing for the batch and it will be replayed by the spout. The transactional state stores the value and transaction id from the last operation. If counting "man" failed, its txid would be less than the current txid and it should add this batches data to the stored value. Otherwise, it can ignore the replay because it knows the data from this batch has already been counted for this key.

  2. State transactions are processed in strict txid order, but only by the stateful components. Functions can execute on upcoming transaction tuples.

  3. It sounds like you want States instead of Functions. The state will remember that it's already completed the batch, and ignore it when replayed.

Joshua Martell
  • 7,074
  • 2
  • 30
  • 37