0

The usual story: I have a com.google.appengine.api.search.Index backed by my datastore. For each document, the documentId is the key of a certain entity. (For visualization, imagine I have a “table” of products and each product has a number of reviews. So each document is a product with descriptitve data and accompanying reviews. SO…)

So, for those who know, each time you update a Document you are essentially recreating it. I want to update the Document each time a user adds a review. But I want to avoid race conditions. But if I use a task queue, how would I design that? As in should it be a single queue or multiple queues? And if multiple queues, how do I avoid interleaving of data? That is, two different queues updating the same document concurrently?

Katedral Pillon
  • 14,534
  • 25
  • 99
  • 199

2 Answers2

0

I assume you update the product entity each time a new review is added (maybe you have a list of review keys in the product entity)?

Not sure I understand the 'multiple queues' approach, but if you update your product entity inside a Datastore transaction you can enqueue a task that recreates the document. That way you can ensure the document is only 'updated' if the product is successfully updated and the updates to the document can be serialised.

This assumes you don't have a lot of contention on your product entities.

tx802
  • 3,524
  • 2
  • 17
  • 22
  • I have a "table" of products and a "table" of reviews. Without much ado, in your very example, what should be the `Maximum Concurrent` of the Task queue to prevent interleaving? Thanks for the idea of transactional enqueuing. I am using Objectify and so would use that approach. – Katedral Pillon Nov 18 '15 at 16:01
0

The answer to avoiding race conditions when writing to the Datastore is the same as a traditional database: transactions. You can easily use a transaction to get around the race condition you brought up. There is no need to use a task queue. The transaction will ensure that if your product was updated in the middle of your operation, your operation will fail. Objectify (which is a great choice) will then automatically retry it.

If you're worried about write contention, though, this is first thing to know: Datastore cannot handle much write throughput on a single entity (or even entity group). If you need to handle more than one or two writes per second you'll need to split up your data. In your example I think you should store each comment as a separate entity. That would eliminate all write contention. You'd only need to include the ID of the product on the comment, and make sure that field is indexed. Then when you need a product and its comments, you query for all of them at once.

Also, FYI, a single task queue does not mean that only one task from that queue will be executing at a time (unless you specifically set it up that way). There are multiple options for restricting the rate at which a queue lets through tasks. Documentation is here.

Eric Simonton
  • 5,702
  • 2
  • 37
  • 54