0

I have the following code to store json to cassandra using spark

ss.read().json("test_data.json").write()
                    .format("org.apache.spark.sql.cassandra")
                    .mode(SaveMode.Append)
                    .option("table", table)
                    .option("keyspace", KEY_SPACE)
                    .option("confirm.truncate", true)
                    .save();

the table has a primary key, and when record has a null value for the primary key, save() throws exception TypeConversionException Cannot convert object [null,null,null,null,null,n..., "test text test text" type class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema to List[AnyRef]

for me it's clear that this record should be either filtered, or to log it when the exception happened. the issue for me is I don't find a way to catch this exception, and then I can log the dirty record.

sc.read().json("test_data.json").na().drop() doesn't help, because the record has some data.

I found there is a saveToCassandra() method in the cassandra connector which might have a way to implement a exception handler, but I couldn't find it in my SparkSession.

SparkSession ss = SparkSession
        .builder()
        .config("spark.cassandra.connection.host", cassandraHost)
        .config("spark.master", "local")
        .getOrCreate();

I use the latest spark version 2.3.2.

Holm
  • 2,987
  • 3
  • 27
  • 48
  • Couldn't you filter the dataset for eliminating null primary-key, before inserting into Cassandra? – Soheil Pourbafrani Oct 31 '18 at 10:20
  • there are several json files to insert, so this would require hardcoded the primary key information for each file. I'd love to catch the exception if there is a similar solution – Holm Oct 31 '18 at 10:25
  • I found this https://stackoverflow.com/questions/39727742/how-to-filter-out-a-null-value-from-spark-dataframe, and the problem seems to be the other field and not the null. the detail exception is type class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema to List[AnyRef] – Holm Oct 31 '18 at 10:38

0 Answers0