0

I'm trying to upload files from my Datalab instance within the notebook itself to my Google Storage Bucket using the Python API but I'm unable to figure it out. The code example provided by Google in its documentation doesn't seem to work in Datalab. I'm currently using the gsutil command but would like to understand how to do this in using the Python API.

File Directory (I want to upload the python files located in the checkpoints folder):

!ls -R

.:
checkpoints  README.md  tpot_model.ipynb

./checkpoints:
pipeline_2020.02.29_00-22-17.py  pipeline_2020.02.29_06-33-25.py
pipeline_2020.02.29_00-58-04.py  pipeline_2020.02.29_07-13-35.py
pipeline_2020.02.29_02-00-52.py  pipeline_2020.02.29_08-45-23.py
pipeline_2020.02.29_02-31-57.py  pipeline_2020.02.29_09-16-41.py
pipeline_2020.02.29_03-02-51.py  pipeline_2020.02.29_11-13-00.py
pipeline_2020.02.29_05-01-17.py

Current Code:

import google.datalab.storage as storage
from pathlib import Path

bucket = storage.Bucket('machine_learning_data_bucket')


for file in Path('').rglob('*.py'):
    # API CODE GOES HERE

Current Working Solution:

!gsutil cp checkpoints/*.py gs://machine_learning_data_bucket
Michael Gardner
  • 1,693
  • 1
  • 11
  • 13

1 Answers1

1

This is the code that worked for me:

from google.cloud import storage
from pathlib import Path

storage_client = storage.Client()
bucket = storage_client.bucket('bucket')

for file in Path('/home/jupyter/folder').rglob('*.py'):
    blob = bucket.blob(file.name)
    blob.upload_from_filename(str(file))
    print("File {} uploaded to {}.".format(file.name,bucket.name))

Output:

File file1.py uploaded to bucket.
File file2.py uploaded to bucket.
File file3.py uploaded to bucket.

EDIT

Or you can use:

import google.datalab.storage as storage
from pathlib import Path

bucket = storage.Bucket('bucket')

for file in Path('/home/jupyter/folder').rglob('*.py'):
    blob = bucket.object(file.name)
    blob.write_stream(file.read_text(), 'text/plain')
    print("File {} uploaded to {}.".format(file.name,bucket.name))

Output:

File file1.py uploaded to bucket.
File file2.py uploaded to bucket.
File file3.py uploaded to bucket.
Community
  • 1
  • 1
marian.vladoi
  • 7,663
  • 1
  • 15
  • 29
  • Forgot to mention I'm trying to run this within the notebook. When I try to import `from google.cloud import storage` I get `ImportError: cannot import name 'storage'` – Michael Gardner Mar 01 '20 at 00:55
  • If you check the path in my code, you can see I did it from my notebook. I suspect you have to install `pip install --upgrade google-cloud-storage` [link](https://stackoverflow.com/questions/50840511/google-cloud-import-storage-cannot-import-storage) – marian.vladoi Mar 01 '20 at 09:55