0

I have the following simple query:

 client = datastore.Client('fmy_project')
 query = client.query(kind='kind1')
 query.add_filter('x', '=','y')
 for row in query.fetch():
   #process the row and save to file

It takes me for 100 rows 5 seconds to fetch the results when working from a local machine. This is awfully slow.

when i make strace on the python process i get many rows of:

recvmsg(9, 0x7ffffc9ee9f0, 0) = -1 EAGAIN (Resource temporarily unavailable)

poll([{fd=8, events=POLLIN}, {fd=9, events=POLLIN}], 2, 200) = 0 (Timeout)

Is there a way to say datastore to fetch everything in one go or make a certain other optimization?

I googled and did not find any related option

Community
  • 1
  • 1
David Michael Gang
  • 7,107
  • 8
  • 53
  • 98

1 Answers1

1

Are you saying it takes 5 seconds only to fetch? Without processing the data in your for loop at the end?

Generally it's better if you:

  1. Fetch the data (fetch returns a list of the data)
  2. Process the data on the returned list itself (do not save to datastore within your loop!)
  3. Save multiple rows at once using "put_multi"

ndb.put_multi(dataList)

See docs here: https://cloud.google.com/datastore/docs/concepts/entities#batch_operations

Khaled
  • 907
  • 1
  • 8
  • 18
  • 1
    Note that David is using the generic datastore client library, not the GAE-optimized `ndb` one (his code might not even be a GAE app or running inside the Google cloud). – Dan Cornilescu Nov 13 '17 at 19:18
  • That's correct. The processing is very minimal. I am just making a statistics with a defaultdict(int) so really no overhead stats[row[0]]+=1 – David Michael Gang Nov 14 '17 at 07:22