How to write a trident topology without aggregations?

Question

I would like to process tuples in batches for which I am in a thought of using Trident API. However, there are no operations that I perform in batches here. Every tuple is processed individually. All that I need here is exactly-once semantics so that every tuple is processed only once and this is the only reason to use Trident.

I want to store the information of which tuple is processed so that when a batch is replayed, the tuple will not be executed when that is already processed.

The topology contains a persistentAggregate() method, but it takes some aggregation operation but I don't have any aggregation operation to perform on a set of tuples as every tuple is processed individually.

Here, the functions that a tuple undergoes are too minute to be executed. So, I am looking to process them in batches in order to save computing resources and time.

Now, how to write a topology which consumes tuples as batches but still doesn't perform any batch operations (like word count)?

score 0 · Answer 1 · answered Sep 03 '15 at 16:29

Looks like what you need is partitionPersist. It should be provided with a state (or a state factory), fields to persist and an updater. For development purposes check MemoryMapState - it's basically an in-memory hashmap. For production you can use, say, cassandra - check out the examples at https://github.com/hmsonline/storm-cassandra-cql

How to write a trident topology without aggregations?

1 Answers1