Moving data from Kinesis -> RDS using Spark with AWS Glue implementation locally

Question

I have a Spark project with AWS Glue implementation running locally.

I listen to a Kinesis stream so when Data is arrived in JSON format, I can storage to S3 correctly. I want to store in AWS RDS instead of storing in S3.

I have tried to use:

dataFrame.write
          .format("jdbc")
          .option("url","jdbc:mysql://aurora.cluster.region.rds.amazonaws.com:3306/database")
          .option("user","user")
          .option("password","password")
          .option("dbtable","test-table")
          .option("driver","com.mysql.jdbc.Driver")
          .save()

Spark project get data from a Kinesis stream using AWS glue job.

I want to add the data to Aurora database.

It fails with error

Caused by: java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL
 server version for the right syntax to use near '-glue-table (`label2` TEXT , `customerid` TEXT , `sales` TEXT , `name` TEXT )' a
t line 1

This is the test dataFrame Im using, dataFrame.show():

+------+----------+-----+--------------------+
|label2|customerid|sales|                name|
+------+----------+-----+--------------------+
| test6|      test| test|streamingtesttest...|
+------+----------+-----+--------------------+

I think you are missing the MODE parameter in the above command. mode can be append, overwrite, error, ignore. Please refer this for a sample: https://stackoverflow.com/questions/46552161/write-dataframe-to-mysql-table-using-pyspark — Yuva, Jun 03 '20 at 03:12
Also i hope you have converted the JSON to dataframe with proper column headers, etc. — Yuva, Jun 03 '20 at 03:14
I add MODE parameter and still failing. I think the dataframe is OK. When I print it (dataFrame.show), it prints fine (added in the question). @Yuva — Asier Gomez, Jun 03 '20 at 06:29

score 0 · Accepted Answer · answered Jun 04 '20 at 06:35

Using Spark DynamicFrame instead of DataFrame and using the glueContext sink to publish to Aurora:

So the final code could be:

lazy val mysqlJsonOption = jsonOptions(MYSQL_AURORA_URI)

//Write to Aurora
val dynamicFrame = DynamicFrame(joined, glueContext)
glueContext.getSink("mysql", mysqlJsonOption).writeDynamicFrame(dynamicFrame)

Moving data from Kinesis -> RDS using Spark with AWS Glue implementation locally

1 Answers1