0

I would like to process tuples in batches for which I am in a thought of using Trident API. However, there are no operations that I perform in batches here. Every tuple is processed individually. All that I need here is exactly-once semantics so that every tuple is processed only once and this is the only reason to use Trident.

I want to store the information of which tuple is processed so that when a batch is replayed, the tuple will not be executed when that is already processed.

The topology contains a persistentAggregate() method, but it takes some aggregation operation but I don't have any aggregation operation to perform on a set of tuples as every tuple is processed individually.

Here, the functions that a tuple undergoes are too minute to be executed. So, I am looking to process them in batches in order to save computing resources and time.


Now, how to write a topology which consumes tuples as batches but still doesn't perform any batch operations (like word count)?

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
JavaTechnical
  • 8,846
  • 8
  • 61
  • 97

1 Answers1

0

Looks like what you need is partitionPersist. It should be provided with a state (or a state factory), fields to persist and an updater. For development purposes check MemoryMapState - it's basically an in-memory hashmap. For production you can use, say, cassandra - check out the examples at https://github.com/hmsonline/storm-cassandra-cql

aljipa
  • 716
  • 4
  • 6