0

I am trying to use the ML Workbench module in datalab.

When running

%%ml analyze --cloud
output: gs://bucket/pathcontinued
data: model_3pcnt
features:

I get an error like.

File "pandas/_libs/lib.pyx", line 1052, in pandas._libs.lib.write_csv_rows
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 4: ordinal not in range(128)

I'm not sure how there can possibly be an encoding error reading directly from a bigquery table into a csv. Is there a workaround or reason writing to a csv is not working?

tdelaney
  • 73,364
  • 6
  • 83
  • 116
bbodek
  • 99
  • 3
  • 9
  • 1
    Which version of python? This looks like a python 2.x Windows code page thing. – tdelaney Apr 09 '18 at 23:56
  • Hey tdelaney, you're right I'm running python 2.7.14. – bbodek Apr 10 '18 at 02:04
  • Somewhere you loaded an encoded file (maybe utf-8, utf-16 or a Windows code page) into pandas and this happens when you try to write it out again. I don't know ML Workbench. Can you specify the input or output encodings? Much better, does this environment support python 3? We are approaching python 3's 10 anniversary and should be preferred when possible. – tdelaney Apr 10 '18 at 05:15
  • Can you provide more details about your use case? What type of data are you reading from BigQuery? Does the data contained in BigQuery include any non-ASCII character? Also you mention that you are writing to a CSV file. Is that file local or is it writing to Cloud Storage, for example? If it is writing to GCS, there was a similar issue with [Datalab writing to BQ UTF-8 chars](https://stackoverflow.com/q/49122740/4482491), and I created an issue in PIT that you may consult too, attached in a comment in that StackOverflow question. – dsesto Apr 27 '18 at 14:06

0 Answers0