2

I'm trying to export a pandas dataframe to a csv file in a bucket on my google cloud storage, but the following code obviously isn't working for me:

my_df.to_csv(StringIO(file_io.FileIO('gs://mybucket/data/file.csv', mode='w+')))

How should this be rewritten? I'm getting the following error:

unbound method write() must be called within FileIO instance as first argument (got nothing instead)

Apologies if the answer is obvious, but I'm just starting to learn python.

Artemis
  • 2,553
  • 7
  • 21
  • 36
quantllama
  • 107
  • 1
  • 8
  • 2
    Possible duplicate of [Save pandas data frame as csv on to gcloud storage bucket](https://stackoverflow.com/questions/45495108/save-pandas-data-frame-as-csv-on-to-gcloud-storage-bucket) – philshem Mar 26 '19 at 21:27
  • Is your CSV of a memory holdable size? If yes, you can apparently write a new object to GCS from a string from python. If your data is too large, you can write it to a local file and then upload the file from API. Don't confuse GCS for a file system. – Kolban Mar 26 '19 at 22:32
  • I was specifically attempting to determine how to use StringIO and FileIO to export a file to gcloud storage bucket. None of the other solutions offered here, which I perused, offered a solution. I was successful in using these to import a csv on gcloud to a dataframe, so I assumed it wouldn't be too complicated to do the same in the other direction. I did manage to get gcs to work, so I'll post how I did so below for anyone else who might be wondering. – quantllama Mar 28 '19 at 19:40

2 Answers2

2

Importing a file from gcloud to dataframe works when I code thus:

from tensorflow.python.lib.io import file_io
from pandas.compat import StringIO
import pandas as pd

def read_data(gcs_path):
   file_stream = file_io.FileIO(gcs_path, mode='r')
   data = pd.read_csv(StringIO(file_stream.read()), names=['various', 'column', 'names'])
   return data

my_df = read_data('gs://mybucket/data/file.csv')

But I haven't been able to reverse the process.

GCS has worked for me, however:

import google.cloud.storage as gcs

client = gcs.Client()
bucket = client.bucket('my-bucket')
blobs = list(bucket.list_blobs(prefix='data/'))

my_df.to_csv('tmp.csv')
local_tmp_path = ('tmp.csv')
target_blob = bucket.blob('data/file.csv')
target_blob.upload_from_file(open(local_tmp_path, 'r'))
quantllama
  • 107
  • 1
  • 8
0

You can save your csv file in your VM and then use gsutil to save it on your bucket.

Python:

my_df.to_csv("data.csv")

Shell:

gsutil data.csv gs://my_bucket/
Lucas
  • 1,166
  • 2
  • 14
  • 34