I'm trying to iterate over a huge number of datastore records, currently about 330,000. Conceptually, each record has a row, a column, and a value, and I'm iterating over the records and constructing a matrix, which I'll then use for calculations.
The error I get is: Timeout: The datastore operation timed out, or the data was temporarily unavailable.
[ADDED: NOTE that my issue is not an app engine timeout. Running as a CRON job, I have plenty of time, and the datastore error happens more quickly than the app engine time out. Also, I have tried the answers given in other questions, as I mention below.]
The error happens after the iteration runs over less than 100,000 of the records.
My current code, which I wrote after consulting past related threads, is:
prodcauses_query = ProdCause.query(projection=['prod_id', 'value', 'cause']).filter(ProdCause.seller_id == seller_id)
for pc in prodcauses_query.iter(read_policy=ndb.EVENTUAL_CONSISTENCY, deadline=600):
### COPY DATA IN RECORD PC INTO A MATRIX
### row is prod_id, col is cause, value is value
Is there any better way to do this than ITER? Any better settings for batch_size or deadline or read_policy?
Note that this process is running in a CRON job, so it doesn't bother me if it takes a long time to do this. The rest of the process takes a few seconds, the hard part has been reading in the data.
Thanks for any thoughts!