I am running an AWS Glue job to load a pipe delimited file on S3 into an RDS Postgres instance, using the auto-generated PySpark script from Glue.
Initially, it complained about NULL values in some columns:
pyspark.sql.utils.IllegalArgumentException: u"Can't get JDBC type for null"
After some googling and reading on SO, I tried to replace the NULLs in my file by converting my AWS Glue Dynamic Dataframe to a Spark Dataframe, executing the function fillna() and reconverting back to a Dynamic Dataframe.
datasource0 = glueContext.create_dynamic_frame.from_catalog(database =
"xyz_catalog", table_name = "xyz_staging_files", transformation_ctx =
"datasource0")
custom_df = datasource0.toDF()
custom_df2 = custom_df.fillna(-1)
custom_df3 = custom_df2.fromDF()
applymapping1 = ApplyMapping.apply(frame = custom_df3, mappings = [("id",
"string", "id", "int"),........more code
References:
How to replace all Null values of a dataframe in Pyspark
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.fillna
Now, when I run my job, it throws the following error:
Log Contents:
Traceback (most recent call last):
File "script_2017-12-20-22-02-13.py", line 23, in <module>
custom_df3 = custom_df2.fromDF()
AttributeError: 'DataFrame' object has no attribute 'fromDF'
End of LogType:stdout
I am new to Python and Spark and have tried a lot, but can't make sense of this. Appreciate some expert help on this.
I tried changing my reconvert command to this:
custom_df3 = glueContext.create_dynamic_frame.fromDF(frame = custom_df2)
But still got the error:
AttributeError: 'DynamicFrameReader' object has no attribute 'fromDF'
UPDATE: I suspect this is not about NULL values. The message "Can't get JDBC type for null" seems not to refer to a NULL value, but some data/type that JDBC is unable to decipher.
I created a file with only 1 record, no NULL values, changed all Boolean types to INT (and replaced values with 0 and 1), but still get the same error:
pyspark.sql.utils.IllegalArgumentException: u"Can't get JDBC type for null"
UPDATE: Make sure DynamicFrame is imported (from awsglue.context import DynamicFrame), since fromDF / toDF are part of DynamicFrame.