2

I am trying to create a new bucket with 2 empty folders within it on Google Cloud storage using python client library.

I referred to the python client library API for GCS (https://google-cloud-python.readthedocs.io/en/latest/storage/client.html) and I found a create_bucket() method, but I would also like to create 2 folders - 'processed' and 'unprocessed' within it, but not able to find a method to create folders. Any help would be appreciated.

Thanks

M Reza
  • 18,350
  • 14
  • 66
  • 71
Parth Desai
  • 49
  • 1
  • 3

2 Answers2

4

GCS has a flat namespace, i.e., the concept of a 'folder' is not built into the service but rather an abstraction implemented by various clients. For example, both the Cloud Storage web UI (console.cloud.google.com/storage/browser) and gsutil implement the folder abstraction using an object name that ends with "/" Thus, you could create folders by creating objects like your-bucket/abc/def/ but that would only be a folder to clients that know about/support that naming convention.

Mike Schwartz
  • 11,511
  • 1
  • 33
  • 36
  • 1
    Thanks Mike! I get that although I would like to have a folder structure beforehand and create new folders every day. I can use your suggestion by naming the files in a convention while uploading them. However, how can I move files from one folder to another within the same bucket using python client API? – Parth Desai May 21 '18 at 20:20
  • The idea is that I want to have 2 folders (unprocessed and processed) so that I can first save input files in 'unprocessed' folder, process them using dataflow and store the data to bigquery and then finally move the files to 'processed' folder after the dataflow job is complete so that the next time I run the same dataflow ETL job, it only picks up files from the 'unprocessed' folder – Parth Desai May 21 '18 at 20:47
  • +Parth-desai It looks like Mike has already answered your initial question. I would mark his answer as a solution and create a new post for other question. You can mention me in the comment bellow the question and I will answer it as soon as I see it. – A.Queue May 22 '18 at 09:11
-1
def copyFilesInFolder(self, file_name, src_blob_name, destination_blob_name):
    """Copies a blob from one bucket to another with a new name."""
    # bucket_name = "your-bucket-name"
    # blob_name = "your-object-name"
    # destination_bucket_name = "destination-bucket-name"
    # destination_blob_name = "destination-object-name"

    # storage_client = storage.Client()

    srcBlob = src_blob_name + '/' + file_name
    destBlob = destination_blob_name + '/' + file_name
    source_blob = self.bucket.blob(srcBlob)
    destination_bucket = storage_client.bucket(destBlob)

    blob_copy = self.bucket.copy_blob(
        source_blob, self.bucket, destBlob
    )
    print(blob_copy)
    print(
        "File {} in bucket {} copied to blob {} in bucket {}.".format(
            file_name,
            src_blob_name,
            file_name,
            destination_blob_name,
        )
    )

    return True

In GCP direct folder creation concept is not there. So we can save a new file in the new folder, this way even the destination folder doesn't exist it'd be created.