12

I am creating a CSV file on google cloud storage using google cloud function. Now I want to edit that file - is it possible to append data in that file? If yes, then how?

robsiemb
  • 6,157
  • 7
  • 32
  • 46
Newton8989
  • 300
  • 1
  • 4
  • 22
  • See if either of these assist ... https://stackoverflow.com/questions/53487432/how-to-append-files-in-gcs-with-the-same-schema and https://stackoverflow.com/questions/52715217/how-to-open-a-file-in-a-gcs-bucket-in-append-mode-using-python – Kolban Nov 20 '19 at 06:14
  • Thanks, for your reply. but it will not work for large files – Newton8989 Nov 20 '19 at 07:00
  • Which proposed workaround does not work for large files ? – Thierry Falvo Nov 20 '19 at 07:21
  • Does this answer your question? [How to append write to google cloud storage file from app engine?](https://stackoverflow.com/questions/20876780/how-to-append-write-to-google-cloud-storage-file-from-app-engine) – Franklin Yu May 14 '21 at 05:19

3 Answers3

19

Google Cloud Storage is the Object Storage managed service for Google Cloud Platform. Unlike a block storage or file system storage, objects stored are immutable.

As mentioned in official doc :

Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime. An object's storage lifetime is the time between successful object creation (upload) and successful object deletion. In practice, this means that you cannot make incremental changes to objects, such as append operations or truncate operations. However, it is possible to overwrite objects that are stored in Cloud Storage, and doing so happens atomically — until the new upload completes the old version of the object will be served to readers, and after the upload completes the new version of the object will be served to readers. So a single overwrite operation simply marks the end of one immutable object's lifetime and the beginning of a new immutable object's lifetime.

As a workaround, we can consider to upload multiples files to a bucket, and then create a new object by composing all previous ones.

gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite

Note that this compose command is also available via JSON API :

POST https://storage.googleapis.com/storage/v1/b/bucket/o/destinationObject/compose

And via Cloud Storage Client Libraries

So this call could be easily integrated into your code. Be sure to grant before needed role to access to bucket.

Check official documentation

Thierry Falvo
  • 5,892
  • 2
  • 21
  • 39
  • Worth to mention, that there is a limit (25 Apr 2022) for objects to be merged together: "There is a limit (currently 32) to the number of components that can be composed in a single operation." – Cililing Apr 25 '22 at 07:49
4

I'm using this python script to append data to a csv files. This script will download the file, append the data and uploadit again to the same file in your bucket. You can implement this easily in your Cloud Function.

import csv
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('thehotbucket')
blob = bucket.get_blob('data1.csv')
blob.download_to_filename('data1.csv')
fields = ['first', 'second', 'third']
with open(r'data1.csv', 'a') as f:
    writer = csv.writer(f)
    writer.writerow(fields)

blob = bucket.blob("data1.csv")
blob.upload_from_filename("data1.csv")

If you only want to merge files you can use the gsutil command

gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/obj1
Chris32
  • 4,716
  • 2
  • 18
  • 30
  • 1
    first solution is going to take too much time if file size is large, can you please tell more about gsutil command – Newton8989 Nov 20 '19 at 10:02
  • As stated in the [documentation](https://cloud.google.com/storage/docs/gsutil/commands/compose) _"The compose command creates a new object whose content is the concatenation of a given sequence of source objects under the same bucket"_. You can run this command in your Cloud Shell specifying the desired documents to append and it will append this documents into a new one – Chris32 Nov 20 '19 at 10:06
  • 3
    Note that this compose command is also available via Json Api or via Cloud libraries, see my answer – Thierry Falvo Nov 20 '19 at 10:47
  • @Chris32 blob.download_to_filename('data1.csv') download and open file in 'wb' mode so this is causing error inside my GCP function. – Newton8989 Nov 27 '19 at 05:36
  • why are you using the 'wb' mode instead of 'a'? – Chris32 Nov 27 '19 at 07:07
  • And please share the details with us... (error, what are you trying to do, your code if is possible) Also please note that I said "im using this python script", if you want to doenload a blob in your Cloud function you want to download it to tmp file so use the route "/tmp/data1.csv" instead – Chris32 Nov 27 '19 at 08:15
3

GCS is an Object Storage and dont allow to update/edit a file once pushed to a GCS bucket.

The only way to update a file which lives in a GCS bucket is to download the file --> Make required changes and then push back to GCS bucket. This will overwrite the file with new content.

Pradeep Bhadani
  • 4,435
  • 6
  • 29
  • 48