0

I'm working on an AppEngine project that requires flexible searches over a dynamic dataset, and Google's Search API seems to fit the bill, so I'm thinking of using it to persist the dataset. The one big flaw is that it has no ACID properties or concept of transactions, especially in dealing the concurrent changes to the Datastore. I think I can work around it by implementing locks stored in Memcache.

Something like this seems like it would work:

@ndb.transactional
def do_in_transaction(stuff):
  client = memcache.Client()

  # acquire lock
  while True:
    lock = client.gets(doc_id)
    if lock == 'arbitrary payload':
      time.sleep(.01)
    elif client.cas(doc_id, lock='arbitrary payload')
      break

  try:
    old_doc = index.get(doc_id)
    ...do stuff in the datastore...
    ...push new document with changes...
  except:
    raise ndb.Rollback
    index.put(old_doc)
  finally:
    # release lock
    client.delete(doc_id)

I don't expect access to any particular document will be particularly contentious, and I think that Memcache lock keys will be short-lived enough that the risk of them getting booted is relatively low. If it becomes an issue, I could always put the locks in the datastore.

The reason I'm asking this question is because I don't have a ton of actual experience with web development or concurrency. Are there any race conditions or edge cases I missed? Other than the obvious improvements (e.g. backoff on lock acquiring, putting it all in a context manager), is there any reason this wouldn't be a good idea?

whereswalden
  • 4,819
  • 3
  • 27
  • 41
  • Why 10,000 results per search is a problem? I do not know any user who would look at more than 50 results. Both Google and Bing return 1,000 results maximum. – Andrei Volgin Jun 11 '14 at 15:54
  • If you ever wanted to do something over every item in the dataset, you could can access the most recent 10,000 without searching something that uniquely matched older items. Attaching a tag to each document that specifies which partition it belongs to and changing that tag every 9000 documents would accomplish the same thing. – whereswalden Jun 11 '14 at 16:41
  • This is what cursor and paging are for: https://developers.google.com/appengine/docs/python/search/cursorclass You can loop through all of your documents -- no special tags necessary. – Andrei Volgin Jun 11 '14 at 16:47
  • Yeah, turns out that restriction only applies to search() calls too, meaning you can retrieve all documents if you get them by rank. I'll edit. – whereswalden Jun 11 '14 at 17:51

0 Answers0