Invalid timestamp when reading Elasticsearch records with Spark

Question

I'm getting invalid timestamp when reading Elasticsearch records using Spark with elasticsearch-hadoop library. I'm using following Spark code for records reading:

val sc = spark.sqlContext
  val elasticFields = Seq(
    "start_time",
    "action",
    "category",
    "attack_category"
  )

  sc.sql(
    "CREATE TEMPORARY TABLE myIndex " +
      "USING org.elasticsearch.spark.sql " +
      "OPTIONS (resource 'aggattack-2021.01')" )

  val all = sc.sql(
    s"""
      |SELECT ${elasticFields.mkString(",")}
      |FROM myIndex
      |""".stripMargin)
  all.show(2)

Which leads to the following result:

+-----------------------+------+---------+---------------+
|start_time             |action|category |attack_category|
+-----------------------+------+---------+---------------+
|1970-01-19 16:04:27.228|drop  |udp-flood|DoS            |
|1970-01-19 16:04:24.027|drop  |others   |DoS            |
+-----------------------+------+---------+---------------+

But I'm expecting timestamp with current year, eg 2021-01-19 16:04:27.228. In the elastic, start_time field has unixtime format in millis -> start_time": 1611314773.641

Try to read as string and cast it to timestamp in Spark. Add this in options : `OPTIONS('resource'='aggattack-2021.01', 'es.mapping.date.rich'= 'false')` then cast using : `from_unixtime(start_time) as start_time` — blackbishop, Jan 23 '21 at 14:09
Hi, I tried it, but it leads to another exception => `The value (1609467228) of the type (java.lang.Long) cannot be converted to the string type` — Jacfal, Jan 24 '21 at 08:13

score 0 · Accepted Answer · answered Jan 25 '21 at 19:34

Problem was with the data in ElasticSearch. start_time field was mapped as epoch_seconds and contained value epoch seconds with three decimal places (eg 1611583978.684). Everything works fine after we have converted epoch time to millis without any decimal places

Invalid timestamp when reading Elasticsearch records with Spark

1 Answers1