Fix unicode encoding error writing from bigquery to a csv in python

Question

I am trying to use the ML Workbench module in datalab.

When running

%%ml analyze --cloud
output: gs://bucket/pathcontinued
data: model_3pcnt
features:

I get an error like.

File "pandas/_libs/lib.pyx", line 1052, in pandas._libs.lib.write_csv_rows
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 4: ordinal not in range(128)

I'm not sure how there can possibly be an encoding error reading directly from a bigquery table into a csv. Is there a workaround or reason writing to a csv is not working?

Which version of python? This looks like a python 2.x Windows code page thing. — tdelaney, Apr 09 '18 at 23:56
Somewhere you loaded an encoded file (maybe utf-8, utf-16 or a Windows code page) into pandas and this happens when you try to write it out again. I don't know ML Workbench. Can you specify the input or output encodings? Much better, does this environment support python 3? We are approaching python 3's 10 anniversary and should be preferred when possible. — tdelaney, Apr 10 '18 at 05:15
Can you provide more details about your use case? What type of data are you reading from BigQuery? Does the data contained in BigQuery include any non-ASCII character? Also you mention that you are writing to a CSV file. Is that file local or is it writing to Cloud Storage, for example? If it is writing to GCS, there was a similar issue with [Datalab writing to BQ UTF-8 chars](https://stackoverflow.com/q/49122740/4482491), and I created an issue in PIT that you may consult too, attached in a comment in that StackOverflow question. — dsesto, Apr 27 '18 at 14:06

Fix unicode encoding error writing from bigquery to a csv in python

0 Answers0