Upload UTF-8 encoded string as CSV to Cloud Storage

Question

I am trying to upload a pandas dataframe as a csv to Google Cloud Storage. The code has to upload the content of the dataframe regardless of the characters it contains.

The problem I have is that when the frame contains special characters, such as Japanese symbols, Google returns an error that is difficult to interpret:

Exception: Failed to process HTTP response.

The code itself is the following:

import datalab.storage as gcs
import pandas as pd
Items_Object = gcs.Bucket('astrologer-2').items(prefix=file_prefix)
     for item in items:
      if not item.key.endswith('/') and item.exists():
        data = StringIO(item.read_from())
        dataFrame = pd.read_csv(data, low_memory=False, sep=',', encoding='utf-8')
        df_string = dataFrame.to_csv(index=False, encoding='utf-8')
        print df_string
        response = item.write_to(df_string, 'text/csv')

The error fires on the line item.write_to(df_string, 'text/csv').

All the code does is reading a CSV and trying to write its content back to Google Cloud Storage (in the future there will be changes made to the content)

The content of the file is

Nombre,Apellido
Lluís,Gassó
Test,Testson
最高,サートした

I tried using 'text/plain', 'text/plain;encoding=UTF-8', 'text/csv;encoding=UTF-8' and 'application/octet-stream' and none of them worked.

Does anyone know why is this error happening and how can it be fixed? Thanks in advance.

The `datalab.*` namespace is deprecated, you should now switch to `google.datalab.*`. I believe there were some UTF-8 fixes there, can you give that a try? — yelsayed, Mar 21 '18 at 22:07
As per the [cloud documentation](https://cloud.google.com/storage/docs/gsutil/addlhelp/Filenameencodingandinteroperabilityproblems) users with files stored in other encodings (such as Latin 1) must convert those filenames to UTF-8 before attempting to upload the files. Have you checked the encoding of your content file? — D Saini, Mar 21 '18 at 23:35
Thanks @yelsayed, I tried using the google.datalab.* namespace and I am still getting the same error. If I pass a string without special characters everything works just fine. When passing any special character Google's library raises the exception, no matter if I pass a utf-8 encoded str or a unicode variable, and regardless of the content_type specified. — Lluis Villarejo, Mar 22 '18 at 11:29
@DSaini, the file I read comes from Cloud Storage as well, so is utf-8 encoded. I am also specifying the utf-8 encoding when reading it and printing the result which looks good, so this doesn't seem to be the issue. — Lluis Villarejo, Mar 22 '18 at 11:32
Seems like a bug with the stream writer then. I'd open a Github issue in `pydatalab`. — yelsayed, Mar 22 '18 at 21:09
You may want to report this issue to [Google public issue tracker](https://issuetracker.google.com/) platform which is meant for issue and feature request tracking. — D Saini, Mar 24 '18 at 16:33

Upload UTF-8 encoded string as CSV to Cloud Storage

0 Answers0