1

I created an AWS glue job that loads data from a CSV file to a Mysql RDS database. The data are loaded successfully but all NULL values were inserted in the MySQL table as strings, not as NULL.

so if I query my table like select * from myTable where myCol is null is have 0 result

but when I do select * from myTable where myCol where = 'NULL' here I get results.

the data type of the fields in question is string.

any idea how to resolve that, please?

adaso
  • 61
  • 5

1 Answers1

0

For anyone interested, I ended up modifying my pyspark script.

I converted all NULL columns treated by AWS glue as a string to null (None, in Python).

import pyspark.sql.functions as f
## ...
  def convertToNull(dfa):
 for i in dfa.columns:
  dfa = dfa.withColumn(i , f.when((f.col(i) == 'NULL') | (f.col(i) == 'null'), None).otherwise(f.col(i)))
 return dfa
## .........

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
adaso
  • 61
  • 5