0

Im using GAE Search API in my server implementation and have been experiencing a strange and undocumented behavior. Sometime new documents that were just added to an index (Index.put()) are not retrieved by searches (Index.search()) that are being performed straight after. Sometime it takes even a minute or so until they become available.

It looks like an eventual consistency problem but i couldnt find anything in the documents that even mentions this issue in relation to the Search API. Stranger is the fact that during this time that these documents are unavailable, i can perform the exact same query on the Admin Console and get the expected results.

Anyone knows what's the deal here ? Is this a normal behavior ? If so, then what is the maximum time for a newly added document until it can be searched ? And why isn't this documented ? This seriously affects my app's functionality.

Thanks.

AsafK
  • 2,425
  • 3
  • 32
  • 38
  • This behavior was acknowledged by google. See [link](https://code.google.com/p/googleappengine/issues/detail?id=10521&q=Fulltextsearch&colspec=ID%20Type%20Component%20Status%20Stars%20Summary%20Language%20Priority%20Owner%20Log) – AsafK Jan 25 '14 at 15:10

2 Answers2

1

From your additional comments it is logical and absolutely essential that all newly added points-of-interest must appear on the user's map. However the Search service will probably continue to not return new additions for an unpredictable amount of time. I would consider two strategies, one server-side and one client-side, and perhaps even use both. But neither is simple.

On the server you could augment the Search service or even replace it entirely with a custom Search that you would develop yourself. Store the data you want to search in Google Cloud SQL which is basically MySQL. This will always immediately return what is written into it, because it is a single instance not subject to eventual consistency.

In the client you could cache all recently added points of interest. Then when requesting data from the server, also query the local cache. Delete any local results that duplicate what the server returns. Other users will also eventually see what this user sees immediately.

Martin Berends
  • 3,948
  • 2
  • 16
  • 19
  • Its a shame i have to settle for these kind of solutions. Actualy i already thought of your 2nd suggestion (client) as a workaround for this problem and i think i prefer it over your 1st one where i lose the scalability and availability benefits of a NoSQL DB. I wont select your answer as it doesnt really gives an official benchmark answer to my question but i will upvote it for the effort you put into it. Thanks a lot man. – AsafK Jan 23 '14 at 11:15
  • Thanks for the feedback. I agree that adding a workaround in the client is better. – Martin Berends Jan 23 '14 at 12:13
0

Yes it makes sense that eventual consistency (Brewer's keynote PDF) also applies to Search. No documented maximum convergence time probably means implementing one would on balance be counterproductive. Without a timing guarantee, GAE might even evolve and behave differently in future. I heard that some GAE users migrate to Riak in order to tune their CAP parameters to suit specific application timing requirements.

The symptom of your newest writes appearing in different subsystems at different times suggests distributed cacheing. Your best strategy is to redesign your functionality to rely less on timing. Most scalable applications have done that.

Martin Berends
  • 3,948
  • 2
  • 16
  • 19
  • First of all thanks for your answer. Secondly, your'e saying "GAE might even evolve and behave differently in future" as the reason of this undocumented behavior, but this is true to every service any product is offering. I think a PaaS provider like google is obliged to publish any limitation of its services and republish once these limitations are changed (as they are doing with their other services). Ok, so much for (justified) complaining. Continuting in next comment... – AsafK Jan 22 '14 at 11:28
  • I think its a bit too late for me to migrate to another platform and regarding a redesign let me describe very briefly my requirement - my app allows users to dynamically add points-of-interest to a map. Once a user added one he's suppose to see it right away when doing a search. As you probably understand, this search is done using the Search API. The only solution i can think of right now is putting a disclamer message saying that it might take a while for his POI to be available for searches... What do you say ? – AsafK Jan 22 '14 at 11:36
  • Your additional information about the requirement explains more clearly what problem you have. It makes perfect sense that your user should be able to see all her additions immediately. It also gives me ideas for another answer, which I shall now start writing... – Martin Berends Jan 22 '14 at 17:56