2

I am converting string to datetime field using joda.time.Datetime libraries but it throws unsupported exception Here is main class code:

//create new var with input data without header
var inputDataWithoutHeader: RDD[String] = dropHeader(inputFile)
var inputDF1 = inputDataWithoutHeader.map(_.split(",")).map{p =>
val dateYMD: DateTime = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss").parseDateTime(p(8))
testData(dateYMD)}.toDF().show()

p(8) is columnn with datatype datetime defined in class testData and CSV data for the column has value like 2013-02-17 00:00:00

Here is testData Class:

case class testData(StartDate: DateTime) { }

Here is the Error I get :

Exception in thread "main"

java.lang.UnsupportedOperationException: Schema for type org.joda.time.DateTime is not supported
    at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:128)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:126)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:126)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:64)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:361)
    at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:47)
    at com.projs.poc.spark.ml.ProcessCSV$delayedInit$body.apply(ProcessCSV.scala:37)
rk1113
  • 125
  • 2
  • 9

3 Answers3

4
  1. As you can read in the official documentation dates in Spark SQL are represented using java.sql.Timestamp. If you want to use Joda time you have to convert output to the correct type

  2. SparkSQL can easily handle standard date formats using type casting:

    sc.parallelize(Seq(Tuple1("2016-01-11 00:01:02")))
      .toDF("dt")
      .select($"dt".cast("timestamp"))
    
zero323
  • 322,348
  • 103
  • 959
  • 935
1

Thanks zero323 for the solution. I used java.sql.Timestamp and here is the code I modified

val dateYMD: java.sql.Timestamp = new java.sql.Timestamp(DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss").parseDateTime(p(8)).getMillis)
testData(dateYMD)}.toDF().show()

and changed my class to

case class testData(GamingDate: java.sql.Timestamp) { }
rk1113
  • 125
  • 2
  • 9
1

Scala spark schema doesnot support datetime explicitly. You can explore other options. They are:

  1. Convert datetime to millis and you can maintain in Long format .

  2. Convert datetime to unixtime (java format) https://stackoverflow.com/a/44957376/9083843

  3. Convert datetime to string. you can change back to joda datetime at any moment using DateTime.parse("stringdatetime")

  4. If you still want to maintain in joda datetime in scala schema then you can convert your dataframe to sequence

    dataframe.rdd.map(r =>DateTime.parse(r(0).toString())).collect().toSeq

CTiPKA
  • 2,944
  • 1
  • 24
  • 27