6

From google colaboratory, if I want to read/write to a folder in a given bucket created in google cloud, how do I achieve this?

I have created a bucket, a folder within the bucket and uploaded bunch of images into it. Now from colaboratory, using jupyter notebook, want to create multiple sub-directories to organise these images into train, validation and test folders.

Subsequently access respective folders for training, validating and testing the model.

With Google drive, we just update the path to direct to specific directory with following commands, after authentication.

import sys
sys.path.append('drive/xyz')

We do some thing similar on desktop version also

import os
os.chdir(local_path)

Does some thing similar exist for Google Cloud Storage?

I colaboratory FAQs, it has procedure for reading and writing a single file, where we need to set the entire path. That will be tedious to re-organise a main directory into sub-directories and access them separately.

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
Srinivasa Rao
  • 161
  • 3
  • 5

1 Answers1

8

In general it's not a good idea to try to mount a GCS bucket on the local machine (which would allow you to use it as you mentioned). From Connecting to Cloud Storage buckets:

Note: Cloud Storage is an object storage system that does not have the same write constraints as a POSIX file system. If you write data to a file in Cloud Storage simultaneously from multiple sources, you might unintentionally overwrite critical data.

Assuming you'd like to continue regardless of the warning, if you use a Linux OS you may be able to mount it using the Cloud Storage FUSE adapter. See related How to mount Google Bucket as local disk on Linux instance with full access rights.

The recommended way to access GCS from python apps is using the Cloud Storage Client Libraries, but accessing files will be different than in your snippets. You can find some examples at Python Client for Google Cloud Storage:

from google.cloud import storage
client = storage.Client()
# https://console.cloud.google.com/storage/browser/[bucket-id]/
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')

Update:

The Colaboratory doc recommends another method that I forgot about, based on the Google API Client Library for Python, but note that it also doesn't operate like a regular filesystem, it's using an intermediate file on the local filesystem:

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • Thanks for the information. Few clarifications. I am not trying to access google cloud from my local desktop environment. I am trying to access from google collaboratory https://colab.research.google.com/notebooks. Do I still need to create client library, service account etc? Also I am trying to read images from the storage in batches using keras datagenerator, so it requires easy reference to the directory! Appreciate if you have any further inputs. Thanks a lot – Srinivasa Rao Feb 28 '18 at 06:19