Google Cloud Datastore very slow after trying to get more than 100 000 entities

Question

I'm trying to get the large amount of entities with runQuery request.

I have an entity kind Task, which contains:

integer id
integer TaskGroupId
non-indexed string field (~200B)

The request body:

{
 "query": {
  "kind": [
   {
    "name": "Task"
   }
  ],
  "filter": {
   "propertyFilter": {
    "property": {
     "name": "TaskGroupId"
    },
    "value": {
     "integerValue": "501"
    },
    "op": "EQUAL"
   }
  }
 },
 "partitionId": {
  "namespaceId": "local"
 }
}

I have ~2 000 000 entities of this type. If I try to run the request which should return about 100 000 entities it will take ~4 minutes to execute.

Is it appropriate performance or am I doing something wrong?

Is there any way to speed up this request?

How are you determining the amount of time taken? What does the timing code look like? At ~208 bytes per entity, for 100,000 entities, it might be rational that it takes this long. You might want to do whatever you're doing via MapReduce or using [Cloud Dataflow](https://cloud.google.com/dataflow/) as opposed to loading them all into one place. — Nick, Aug 12 '16 at 19:29
I determine the amount of time taken by `console.time()` before the request and `console.timeEnd()` after. — mmurygin, Aug 29 '16 at 06:26
I found the temporary solution for my problem: 1. Before the request I have the ids of all tasks. 2. Split the ids list into chunks 3. Run get by key request via map/reduce inside my node js application. 4. I receive the result in ~40 seconds. According to the above solution, it doesn't look like the entity size issue. It seems that non-primary key queries in datastore are really slow. — mmurygin, Aug 29 '16 at 07:22

Google Cloud Datastore very slow after trying to get more than 100 000 entities

0 Answers0