Not able to insert data into redshift table if any column has any NULL values from s3

Question

I am having source data in s3 in below format.

WM_ID,SOURCE_SYSTEM,DB_ID,JOB_NUM,NOTE_TYPE,NOTE_TEXT,NOTE_DATE_TIME
WOR25,CORE,NI,NI1LBE14,GEN,"",2020-02-01 17:23:32
WOR25,FSI,NI,NI1LBR39,CPN,"",2020-02-04 13:47:35
WOR25,FSI,NI,NI1LBE14,ACC,"",2020-02-03 13:22:56
WOR25,CORE,NI,NI1LBR39,FIT,NA,2020-02-05 13:13:08

Here NOTE_TEXT has some values with NULL. While trying to insert to redshift table using jdbc loader using streamsets transformer(spark-submit), it is not working.

RUN_ERROR: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.sql.SQLFeatureNotSupportedException: [Amazon][JDBC](10220) Driver does not support this optional feature. at com.amazon.exceptions.ExceptionConverter.toSQLException(Unknown Source) at com.amazon.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source) at com.amazon.jdbc.common.SPreparedStatement.setNull(Unknown Source) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:658) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:834) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at

If I convert all the NULL values to string it is working as expected. Can anyone guide me with the correct approach?

metadaddy · Answer 1 · 2020-02-13T19:32:37.313

0

The problem is that the Redshift JDBC driver itself doesn’t support writing null values. The workaround is to convert to string.

You can use Field Replacer to replace NULLs with a placeholder.

We at StreamSets are looking at resolving this in a future release.

edited Feb 13 '20 at 19:32

answered Feb 13 '20 at 01:10

metadaddy

4,234
1
22
46

it is not possible to convert null values to string if we have a lot of columns in the table, also this issue occurs if any one value is null, so we need to apply nvl to all the columns while loading data to redshift. – anidev711 Feb 13 '20 at 03:22
Another blocker is transformer doesn't support copy operation from s3, unlike data collector. – anidev711 Feb 13 '20 at 03:23
1

We are adding support for Redshift in an upcoming release of Transformer. – Hari Shreedharan Feb 13 '20 at 06:04

Not able to insert data into redshift table if any column has any NULL values from s3

1 Answers1