I have the following code to store json to cassandra using spark
ss.read().json("test_data.json").write()
.format("org.apache.spark.sql.cassandra")
.mode(SaveMode.Append)
.option("table", table)
.option("keyspace", KEY_SPACE)
.option("confirm.truncate", true)
.save();
the table has a primary key, and when record has a null value for the primary key, save()
throws exception TypeConversionException Cannot convert object [null,null,null,null,null,n..., "test text test text"
type class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema to List[AnyRef]
for me it's clear that this record should be either filtered, or to log it when the exception happened. the issue for me is I don't find a way to catch this exception, and then I can log the dirty record.
sc.read().json("test_data.json").na().drop()
doesn't help, because the record has some data.
I found there is a saveToCassandra()
method in the cassandra connector which might have a way to implement a exception handler, but I couldn't find it in my SparkSession.
SparkSession ss = SparkSession
.builder()
.config("spark.cassandra.connection.host", cassandraHost)
.config("spark.master", "local")
.getOrCreate();
I use the latest spark version 2.3.2.