0

I have a problem where it would be very helpful if I was able to send a ReadModifyWrite request to BigTable where it only overwrites the value if the new value is bigger/smaller than the existing value. Is this somehow possible? Note: I thought of a hacky way where I use the timestamp as my actual value, and have the max number of versions 1, so that would keep the "latest" value which is the higher timestamp. But those timestamps would have values from 1 to 10 instead of 1.5bn. Would this work?

I looked into the existing APIs but haven't found anything that would help me do this. It seems like it is available in DynamoDB, so I guess it's reasonable to ask for BigTable to have it as well https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html#API_UpdateItem_RequestSyntax

Oguz Yildiz
  • 51
  • 1
  • 7

1 Answers1

3

Your timestamp approach could probably be made to work, but would interact poorly with stuff like age-based garbage collection.

I also assume you mean CheckAndMutate as opposed to ReadModifyWrite? The former lets you do conditional overwrites, the latter lets you do unconditional increments/appends. If you actually want an increment that only works if the result will be larger, just make sure you only send positive increments ;)

My suggestion, assuming your client language supports it, would be to use a CheckAndMutateRow request with a value_range_filter. This will require you to use a fixed-width encoding for your values, but that's no different than re-using the timestamp.

Example: if you want to set the value to 000768, but only if that would be an increase, use a value_range_filter from 000000 to 000767, inclusive, and do your write in the true_mutation of the CheckAndMutate.

  • Yeah, I'd be sacrificing the TTL functionality by doing timestamp approach, luckily in my case it's doable :) but it might not have been. I think "write max/min" should be supported by Dataflow BigTable, don't you think? As far as I know it doesn't go against any dataflow/bigtable principles. – Oguz Yildiz Jul 22 '19 at 16:31
  • > "just make sure you only send positive increments" => This is not possible in my case :/ – Oguz Yildiz Jul 22 '19 at 16:52
  • About ReadModifyWrite: Unfortunately the protobuf type `Mutation` doesn't have `CheckAndMutateRow`. Is it possible to emit `CheckAndMutateRow` from Dataflow steps into BigTableIO.Write? – Oguz Yildiz Jul 22 '19 at 16:55
  • Ah, in Dataflow this might be trickier. AIUI we don't expose non-idempotent writes like ReadModifyWrite or CheckAndMutateRow because we can't prevent Dataflow from retrying when it receives an error. I'm surprised that atomicity matters to you in a batch-y context though? Could you just do non-atomic reads and writes? – Douglas McErlean Jul 23 '19 at 17:14
  • atomicity doesn't matter to me. Isn't "keep max" an idempotent operation? The point is to keep the max values in a cheaper way, instead of adding billions of reads just to check if it's greater than x – Oguz Yildiz Sep 11 '19 at 19:57
  • I'm not suggesting reads to check if something is greater than x. I'm suggesting to read the value, increment it in memory, and then write it back. In contexts where ReadModifyWriteRow is available you can just send an increment, and only if the increment is positive will the new value be the max. But in dataflow that's not an option because retrying an increment isn't safe. – Douglas McErlean Sep 20 '19 at 17:21