I would like to know whether trident batches are executed in parallel i.e. multiple batches can run at a time?
Apart from this I have few questions which are too small to be posted individually. If they are quite large enough, feel free to comment to post them individually.
What if processing only a particular tuple in a batch is failed?
Then the batch will be replayed, resulting in the reprocessing of tuples that are successfully processed previously? For example, word count, in which every tuple contains a word but only a few tuples were successfully counted? For example, if there are three words called man and the count shows only 2 which means that one tuple is failed in processing?
In this tutorial only, previous txid is stored. What about the previous transaction ids?
For example, there are three batches 1,2,3,4. Now, after batch #1, #2 are executed and batch #1 is replayed. Then txid will be 2 since the most recently processed batch is batch #2 and there is no way to recognize that whether batch #1 is previously processed or not. If so, then the batches must be executed in order. That means until the batch #1 is successfully finished batch #2 cannot be executed. If it is the case, then where is the parallelism in executing the batches?
What if only a particular function is not executed properly for a batch in a topology?
For example, I have two functions, one is to persist the message into database and the other is to produce to kafka queue. And here, persisting in the database is successful however pushing to the kafka queue is failed due to some node failures (say for example). Then, I would want only the function that pushes to the kafka queue to be executed for that particular batch. Is there a way to do in trident? For this, I will need to store not only the txid but also a list of functions that are to be processed for that txid. How could it be done?