-1

I am using kafka + spark streaming to stream messages and do analytics, then saving to phoenix. Some spark job fail several times per day with the following error message:

org.apache.phoenix.schema.IllegalDataException: 
java.lang.IllegalArgumentException: Invalid format: ""
    at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:297)
    at org.apache.phoenix.util.DateUtil.parseDateTime(DateUtil.java:163)
    at org.apache.phoenix.util.DateUtil.parseTimestamp(DateUtil.java:175)
    at org.apache.phoenix.schema.types.PTimestamp.toObject(PTimestamp.java:95)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:194)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:172)
    at org.apache.phoenix.expression.LiteralExpression.newConstant(LiteralExpression.java:159)
    at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:979)
    at org.apache.phoenix.compile.UpsertCompiler$UpsertValuesCompiler.visit(UpsertCompiler.java:963)
    at org.apache.phoenix.parse.BindParseNode.accept(BindParseNode.java:47)
    at org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:832)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
    at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
    at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
    at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
    at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
    at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
    at org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:39)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1251)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Invalid format: ""
    at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:673)
    at org.apache.phoenix.util.DateUtil$ISODateFormatParser.parseDateTime(DateUtil.java:295)

My code:

val myDF = sqlContext.createDataFrame(myRows, myStruct)
  myDF.write
      .format(sourcePhoenixSpark)
      .mode("overwrite")
      .options(Map("table" -> (myPhoenixNamespace + myTable), "zkUrl" -> myPhoenixZKUrl))
      .save()

I am using phoenix-spark version 4.7.0-HBase-1.1. Any suggestion to solve the problem would be appreciated. Thanks

1 Answers1

0

You are trying to process dirty data.

That error comes from here: https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/util/DateUtil.java#L301

Where it's trying to parse some string that is expected to be a Date in ISO format and the provided String is empty ("").

You need to prepare+clean your data before attempting to write it to storage.

maasg
  • 37,100
  • 11
  • 88
  • 115
  • Thanks, @maasg. Just figured it out. This happens when saving a StringType column in spark dataframe to TimeStamp column in Phoenix – Hoang Son Jun 01 '17 at 20:51