3

I have a list of Products which have to be added to a Purchase Order. The Purchase order has a sequence number and once the Products are added, their status should be changed to indicate that these are out for purchase.

The typical number of Products being processed in 1 Purchase Order would be 500.

On the DB - I have 2 tables -> 1 for Products and another for Purchase Orders. Which means I need 500 updates and 1 insert to be done. When I try to do this in a BatchStatement I get the error - Batch too large.

Suggestions from various quarters tell me that I should use multiple async queries. My concern however is atomicity of the entire operation. Please suggest what would be the best way forward given my requirement.

Thanks in advance.

Anurag Joshi
  • 235
  • 1
  • 4
  • 17

2 Answers2

2

This is interesting. Inserting a lot of inserts (> 10) into a batch (to achieve atomicity) is really going to be a bad performancer, so raising the batch limit is not really an option.

Since Cassandra manages atomicity at single row level also, you could exploit that by changing your model by adding a table to "bookmark" your purchase orders, where you store there in one row only both the purchase order id and the items into a map, so you have idempotency in your queries. You can then expand or post process this table to continue your workflow as needed.

xmas79
  • 5,060
  • 2
  • 14
  • 35
1

My concern however is atomicity of the entire operation. Please suggest what would be the best way forward given my requirement.

Please note, Cassandra batches doesn't provide isolation (http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2):

Note that we mean “atomic” in the database sense that if any part of the batch succeeds, all of it will. No other guarantees are implied; in particular, there is no isolation; other clients will be able to read the first updated rows from the batch, while others are in progress.

So if you need isolation, as @xmas79 answered, you should store products and purchase orders together in one table.

If isolation and performance are not critical, you could try to tune Cassandra yaml and increase value for batch_size_fail_threshold_in_kb parameter

Fail any batch exceeding this value. 50kb (10x warn threshold) by default.

Mikhail Baksheev
  • 1,394
  • 11
  • 13