0

Has anyone been successful in backing up large datastore kinds to cloud storage? This is an experimental feature so support is pretty sketchy on the google end.

The kind in question we want to backup to cloud storage (ultimately with the goal of ingesting from cloud storage into big query) is currently sitting at 1.2 TB in size.

- description: BackUp
  url: /_ah/datastore_admin/backup.create?name=OurApp&filesystem=gs&gs_bucket_name=OurBucket&queue=backup&kind=LargeKind
  schedule: every day 00:00
  timezone: America/Regina
  target: ah-builtin-python-bundle

We keep running into the following error message:

Traceback (most recent call last):
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 182, in handle
    input_reader, shard_state, tstate, quota_consumer, ctx)
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 263, in process_inputs
    entity, input_reader, ctx, transient_shard_state):
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 318, in process_data
    output_writer.write(output, ctx)
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 711, in write
    ctx.get_pool("file_pool").append(self._filename, str(data))
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 266, in append
    self.flush()
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 288, in flush
    f.write(data)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 297, in __exit__
    self.close()
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 291, in close
    self._make_rpc_call_with_retry('Close', request, response)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 427, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 250, in _make_call
    rpc.check_success()
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 570, in check_success
    self.__rpc.CheckSuccess()
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess
    raise self.exception
DeadlineExceededError: The API call file.Close() took too long to respond and was cancelled.
Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
Jesse
  • 8,223
  • 6
  • 49
  • 81

1 Answers1

1

There seems to be an undocumented time limit of 30 seconds for write operations from gae to cloud storage. This applies also to write-ops made on a backend, so the maximum file-size you could create from the gae in the cloud-storage depends on your throughput. Our solution is to split the file; each time the writer-task approaches 20 seconds, it closes the current file and opens a new one and then we join these files locally. For us this results in files of about 500KB (compressed), so this might not be an acceptable solution for you...

T. Steinrücken
  • 449
  • 2
  • 2