0

I’m working through the Google quick start examples for Cloud Learning / Tensorflow as shown here: https://cloud.google.com/ml/docs/quickstarts/training

I want my python program to access data that I have stored in a Google Cloud bucket such as gs://mybucket. How do I do this inside of my python program instead of calling it from the command line?

Specifically, the quickstart example for cloud learning utilizes data they provided but what if I want to provide my own data that I have stored in a bucket such as gs://mybucket?

I noticed a similar post here: How can I get the Cloud ML service account programmatically in Python? ... but I can’t seem to install the googleapiclient module.

Some posts seem to mention Apache Beam though I can’t tell if that’s relevant to me, but besides I can’t figure out how to download or install that whatever it is.

Community
  • 1
  • 1
ct_sphon
  • 53
  • 1
  • 5
  • I am also facing the same issue. I am working with image dataset. I have uploaded the dataset to bucket. But not able to use these folders inside jupyter notebook on VM instance. If you found solution , please suggest. – KMittal Jul 15 '18 at 17:42

2 Answers2

7

If I understand your question correctly, you want to programmatically talk to GCS in Python.

The official docs are a good place to start.

First, grab the module using pip:

pip install --upgrade google-cloud-storage

Then:

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')
Graham Polley
  • 14,393
  • 4
  • 44
  • 80
2

Assuming you are using Ubuntu/Linux as an OS and already having data in GCS bucket Execute following commands from a terminal or can be executed on Jupyter Notebook(just use ! before commands):

--------------------- Installation -----------------

1st install storage module: on Terminal type:

pip install google-cloud-storage

2nd to verify storage installed or not type the command:

gsutil 

(o/p will show available options)

---------------------- Copy data from GCS bucket --------

type this command: to check whether you are able to get information about bucket

gsutil acl get gs://BucketName

Now copy the file from GCS Bucket to your machine:

gsutil cp gs://BucketName/FileName /PathToDestinationDir/

In this way, you will be able to copy data from this bucket to your machine for further processing purpose.

NOTE: all the above commands can be run from a Jupyter Notebook just use ! before commands, it will run e.g.

!gsutil cp gs://BucketName/FileName /PathToDestinationDir/
Yogesh Awdhut Gadade
  • 2,498
  • 24
  • 19