1

I'm trying to upload large files using the GCS writer:

    bucketHandle := m.Client.Bucket(bucket)
    objectHandle := bucketHandle.Object(path)
    writer := objectHandle.NewWriter(context.Background())

then for chunks of size N I call writer.write(myBuffer). I'm seeing some out of memory exceptions on my cluster and wondering if this is actually just buffering the entire file into memory or not. What are the semantics of this operation, am I misunderstanding something?

user38643
  • 341
  • 1
  • 7
  • The data is buffered in memory before being uploaded to GCS through the Go SDK's "Write" method on a GCS writer. This means that in order to prevent running out of memory when uploading really large files, you might need to chunk the data and submit it in smaller portions. – Chanpols Feb 09 '23 at 22:12
  • @ChristianPaulAndaya is the data flushed upon calling write? I'm chunking the input in 5MB chunks, calling write, (repeat in loop) – user38643 Feb 10 '23 at 00:38
  • 1
    Yes, after each Write call in your code, the data is flushed to GCS. The Write method returns the amount of bytes written and any errors that were encountered along with the number of bytes actually written to the underlying connection. The data is flushed to GCS after each chunk is written, thus the amount of memory consumed on the client side should be restricted to the size of the buffer, which in your instance is 5 MB, if you are chunking the input data into 5 MB chunks and using Write in a loop. – Chanpols Feb 10 '23 at 19:44
  • Does this answer your question? If yes, I will post it as an answer. Thanks – Chanpols Feb 13 '23 at 17:16
  • 1
    @ChristianPaulAndaya I had a bug in my code, but that's a great response, feel free to post. – user38643 Feb 13 '23 at 17:31

1 Answers1

1

Yes, after each Write call in your code, the data is flushed to GCS. The Write method returns the amount of bytes written and any errors that were encountered along with the number of bytes actually written to the underlying connection. The data is flushed to GCS after each chunk is written, thus the amount of memory consumed on the client side should be restricted to the size of the buffer, which in your instance is 5 MB, if you are chunking the input data into 5 MB chunks and using Write in a loop.

Chanpols
  • 1,184
  • 1
  • 3
  • 13