4

I just want to grab some output data from a Google Cloud Datalab notebook quickly, preferably as a throwaway CSV file.

I've done this:

writer = csv.writer(open('output.csv', 'wb'))
for row in rows:
    writer.writerow(row)

This writes a local file, but then I can't either open it in the browser, or (see how to) download it from Cloud Datalab.

How can I quickly grab my data as a CSV file? I guess maybe I have to use the Storage APIs and write it ? I'm finding the docs a bit hard to follow, I've got something like this:

import gcp
import gcp.storage as storage

// create CSV file? construct filepath? how?

mybucket = storage.Bucket(myfile)
mybucket.create()
Richard
  • 62,943
  • 126
  • 334
  • 542

5 Answers5

10

There are at least 2 options:

Download files locally from Datalab

This option does not appear to be available in the current Datalab code. I have submitted a pull request for Datalab which may resolve your issue. The fix allows users to edit/download files which are not notebooks (*.ipynb) using the Datalab interface. I was able to download/edit a text file from Datalab using the modification in the pull request.

Send files to a Storage Bucket in Google Cloud

The following link may be helpful in writing code to transfer files to a storage bucket in Google Cloud using the Storage API.

Here is a working example:

from datalab.context import Context
import datalab.storage as storage

sample_bucket_name = Context.default().project_id + '-datalab-example'
sample_bucket_path = 'gs://' + sample_bucket_name

sample_bucket = storage.Bucket(sample_bucket_name)

# Create storage bucket if it does not exist
if not sample_bucket.exists():
    sample_bucket.create()

# Write an item to the storage bucket
sample_item = sample_bucket.item('stringtofile.txt')
sample_item.write_to('This is a string', 'text/plain')

# Another way to copy an item from Datalab to Storage Bucket
!gsutil cp 'someotherfile.txt' sample_bucket_path

Once you've copied an item, click here to view the item in a Storage Bucket in Google Cloud

Anthonios Partheniou
  • 1,699
  • 1
  • 15
  • 25
0

How much data are you talking about? I'm assuming this is not a BigQuery Table, as we have APIs for that.

For the storage APIs, think of a bucket as being like a folder. You need to create an Item in the Bucket. If you assign the data to a Python variable as a string, there is an API on Item (write_to) that you can use.

If you write to a file like you did with output.csv, that file lives in the Docker container that Datalab is running in. That means it is transient and will disappear when the container is shut down. However, it is accessible in the meantime and you can use a %%bash cell magic to send it to some other destination using, for example, curl.

Graham Wheeler
  • 2,734
  • 1
  • 19
  • 23
  • Thanks. Only about 1000 rows, so no need for BigQuery. Can I create the CSV file locally, then push it it into a bucket? That might be the most straightforward way of doing things. – Richard Mar 02 '16 at 23:31
  • Sure. Use StringIO to write it to a string instead of a file, then using the GCS APis or magics to push to GCS. – Graham Wheeler Mar 03 '16 at 18:04
0

I found an easier way to write csv files from datalab notebook to bucket.

    %storage write --object "gs://pathtodata/data.csv" --variable data

Here 'data' is a dataframe in your notebook !

Ramkumar Hariharan
  • 967
  • 1
  • 7
  • 8
0

Use the ungit tool available in datalab to commit your files to your Google source repository and then clone that repository onto your local machine using the gcloud command:

C:\\gcloud source repos clone datalab-notebooks --project=your-vm-instance-name
0

As someone posted above:

!gsutil cp 'someotherfile.txt' sample_bucket_path

did the job for me. Got the file from Datalab into Google cloud storage.

Adil B
  • 14,635
  • 11
  • 60
  • 78