1

We have jobs moving data from one bucket to another bucket. The job is converting CSV to parquet. While moving glue is dropping records with most null column. Let say we have 50 column table and if 20 subsequent columns are empty or null then that record is dropped. I am not able understand this behavior. Please let me know if someone have see this behavior and have rectified it. Below order of the code.

datasource0 = applymapping1 = resolvechoice2 = datasink =

I don't have dropnullfields, I took it out because I suspected it. please shed some light Regards, Prakash

prakash
  • 165
  • 1
  • 2
  • 10

1 Answers1

0

Type cast the null fields to int.

More Details:

How to handle null values when writing to parquet from Spark

https://issues.apache.org/jira/browse/SPARK-10943

Kishore Bharathy
  • 441
  • 1
  • 3
  • 11