2

I have a Python generator that will yield a large and unknown amount of byte data. I'd like to stream the output to GCS, without buffering to a file on disk first.

While I'm sure this is possible (e.g., I can create a subprocess of gsutil cp - <...> and just write my bytes into its stdin), I'm not sure what's a recommended/supported way and the documentation gives the example of uploading a local file.

How should I do this right?

Yaniv Aknin
  • 4,103
  • 3
  • 23
  • 29
  • 1
    The magic is to convert your generator into a stream that yields each time a read is performed. The Python example in your reference link demonstrates how to read the stream. This article will help you create a stream backed by a generator: https://coderscat.com/python-generator-and-yield/ – John Hanley Sep 06 '22 at 00:18

1 Answers1

5

The BlobWriter class makes this a bit easier:

bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('my_object')
writer = BlobWriter(blob)

for d in your_generator:
  writer.write(d)

writer.close()
David
  • 9,288
  • 1
  • 20
  • 52