I am planning to store high-volume order transaction records from a commerce website to a repository (Have to use cassandra here, that is our DB). Let us call this component commerceOrderRecorderService.
Second part of the problem is - I want to process these orders and push to other downstream systems. This component can be called batchCommerceOrderProcessor.
commerceOrderRecorderService & batchCommerceOrderProcessor both will run on a java platform.
I need suggestion on design of these components. Especially the below:
commerceOrderRecorderService
What is he best way to design the columns, considering performance and scalability? Should I store the entire order (complex entity) as a single JSON object. There is no search requirement on the order attributes. We can at least wait until they are processed by the batch processor. Consider - that a single order can contain many sub-items - at the time of processing each of which can be fulfilled differently. Designing columns for such data structure may be an overkill
What should be the key, given that data volumes would be high. 10 transactions per second let's say during peak. Any libraries or best practices for creating such transactional data in cassandra? Can TTL also be used effectively?
batchCommerceOrderProcessor
- How should the rows be retrieved for processing?
- How to ensure that a multi-threded implementation of the batch processor ( and potentially would be running on multiple nodes as well ) will have row level isolation. That is no two instance would read and process the same row at the same time. No duplicate processing.
- How to purge the data after a certain period of time, while being friendly to cassandra processes like compaction.
Appreciate design inputs, code samples and pointers to libraries. Thanks.