I have a table in redshift where we have a column name -->( agent's_next_of_kin) if you see it has an apostrophe s in the name now when I am reading it into my DynamicFrame with glue it gives me the above error saying syntax issues . how can I make it work and fix this , do I need to change the column name ? or is there any workaround , I also tried dropping the column but seems like it didn't reach there before even dropping its showing error while reading it into the datasource0. Please help to fix the issue
Asked
Active
Viewed 1,695 times
1 Answers
0
Try reading the data using spark native dataframe instead of dynamic dataframe. I had faced this issue when I had space in my column name. I had put the column name within back ticks `` while using selectExpr function to resolve this.
Reading from redshift using spark:
val jdbcURL = "jdbc:redshift://test-redshift.czac2vcs84ci.us-east-.redshift.amazonaws.com:5439/testredshift?user=redshift&password=W9P3GC42GJYFpGxBitxPszAc8iZFW"
val tempS3Dir = "s3n://spark-redshift-testing/temp/"
val salesDF = sqlContext.read
.format("com.databricks.spark.redshift")
.option("url", jdbcURL) //Provide the JDBC URL
.option("tempdir", tempS3Dir) //User provides a temporary S3 folder
.option("dbtable", "sales") //or use .option("query","select * from sales")
.load()

Prashant Singh
- 31
- 5
-
how can I import the jar ? from the external libraries path ? and if I give the path of the jar to my local directory , how do I mention it in the glue job – bigDataArtist Jul 30 '21 at 16:00
-
I am guessing it should be something like import from , can you please provide the correct import way to mention it in my glue job – bigDataArtist Jul 30 '21 at 16:01
-
upload the jar in s3 bucket and then you can give pass that s3 path in "Dependent jars path" option under security configuration. – Prashant Singh Aug 02 '21 at 09:59
-
okay let me try thanks, also do you know how can I used the UDFs functions which I want to import into my glue job . so I have a UDF which Adds two number how can I use it in my job. so a python program def Addnum(list): print (a+b) now I want to use it in pyspark glue job I will upload it to s3 then import under security configurations then next step I am confused .do I write from addNum import * or something like this? Please help – bigDataArtist Aug 03 '21 at 21:45
-
create a py file which will have your UDF example "myUdf". Then upload the file to s3 and pass the s3 location of the path to 'Python library path' option under security configuration. Then you can simply add "import myUdf" and call the function in your glue job like this: import myUdf **next line** df = myUdf.myFunction(param1, param2) – Prashant Singh Aug 04 '21 at 11:26