2

I'm trying to convert glue dynamic frame into the spark dataframevusing Dynamicframe.toDF, but I'm getting this exception

Traceback (most recent call last): File "/tmp/ManualJOB", line 62, in df1 = datasource0.toDF() File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 147, in toDF return DataFrame(self._jdf.toDF(self.glue_ctx._jvm.PythonUtils.toSeq(scala_options)), self.glue_ctx) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o176.toDF. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 15.0 failed 4 times, most recent failure: Lost task 1.3 in stage 15.0 (TID 198, 172.31.0.175, executor 6): com.amazonaws.services.glue.util.FatalException: Unable to parse file: Manual Bound.csv

Can anyone help me with what I am missing?

Thanks in advance!

Akhil
  • 69
  • 1
  • 6
  • Can you confirm if your file Manual Bound.csv is doesn't has characters other than utf-8 ? Glue only supports utf-8 encoding . check your file iconv -f UTF-8 your_file -o /dev/null; echo $? if it has non utf-8 characters? – Prabhakar Reddy Sep 15 '20 at 08:10
  • Yes. There were some characters other than utf-8. So that was the problem. Thanks @PrabhakarReddy – Akhil Sep 15 '20 at 10:54
  • I have posted the answer. Please mark it as answered if it helped. – Prabhakar Reddy Sep 15 '20 at 11:10

1 Answers1

1

This issue happens when there are characters which are of non UTF-8 encoding.Glue only supports UTF-8 encoding as per this doc.

Text-based data, such as CSVs, must be encoded in UTF-8 for AWS Glue to process it successfully. For more information, see UTF-8 in Wikipedia.

You can verify if your file has invalid characters by running below command which will print them.This is for linux and you can use equivalent if you are using other operating system.

iconv -f UTF-8 your_file -o /dev/null; echo $?

to convert to UTF-8 you can pass the CSV to below command

iconv -f ISO-8859-1 -t UTF-8 file.csv > file-utf8.csv
Prabhakar Reddy
  • 4,628
  • 18
  • 36