NullPointerException after extracting a Teradata table with Scala/Spark

Question

I need to extract a table from Teradata (read-only access) to parquet with Scala (2.11) / Spark (2.1.0). I'm building a dataframe that I can load successfully

val df = spark.read.format("jdbc").options(options).load()

But df.show gives me a NullPointerException:

java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)

I did a df.printSchema and I found out that the reason for this NPE is that the dataset contains null values for (nullable = false) columns (it looks like Teradata is giving me wrong information). Indeed, I can achieve a df.show if I drop the problematic columns.

So, I tried specifying a new schema with all columns set to (nullable = true):

val new_schema = StructType(df.schema.map {
  case StructField(n,d,nu,m) => StructField(n,d,true,m)
})

val new_df = spark.read.format("jdbc").schema(new_schema).options(options).load()

But then I got:

org.apache.spark.sql.AnalysisException: JDBC does not allow user-specified schemas.;

I also tried to create a new Dataframe from the previous one, specifying the wanted schema:

val new_df = df.sqlContext.createDataFrame(df.rdd, new_schema)

But I still got an NPE when taking action on the dataframe.

Any idea on how I could fix this?

This issue seems to be related, but still, no solution is provided: https://community.teradata.com/t5/Connectivity/Teradata-JDBC-Driver-returns-the-wrong-schema-column-nullability/td-p/40628 — RaphDG, Aug 30 '17 at 07:45
Hey @RaphDG did you find any solution for this. Now I'm running with this problem. — User4567, Feb 22 '18 at 12:57
@stefanobaghino you can check my question here https://stackoverflow.com/questions/48889855/handling-null-values-from-teradata-with-spark-and-java — User4567, Feb 23 '18 at 07:38

User4567 · Answer 1 · 2018-02-27T07:22:30.683

I think this is resolved in teradata latest version jars, After all the research I updated my teradata jars (terajdbc4.jar and tdgssconfig.jar) version to 16.20.00.04 and changed the teradata url to

teradata.connection.url=jdbc:teradata://hostname.some.com/
TMODE=ANSI,CHARSET=UTF8,TYPE=FASTEXPORT,COLUMN_NAME=ON,MAYBENULL=ON

this is worked after I added teradta url properties COLUMN_NAME=ON,MAYBENULL=ON

Now everything is working fine.

you can check the reference document here

https://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#2403_2403ch022113

NullPointerException after extracting a Teradata table with Scala/Spark

1 Answers1