0

I'm getting invalid timestamp when reading Elasticsearch records using Spark with elasticsearch-hadoop library. I'm using following Spark code for records reading:

val sc = spark.sqlContext
  val elasticFields = Seq(
    "start_time",
    "action",
    "category",
    "attack_category"
  )

  sc.sql(
    "CREATE TEMPORARY TABLE myIndex " +
      "USING org.elasticsearch.spark.sql " +
      "OPTIONS (resource 'aggattack-2021.01')" )

  val all = sc.sql(
    s"""
      |SELECT ${elasticFields.mkString(",")}
      |FROM myIndex
      |""".stripMargin)
  all.show(2)

Which leads to the following result:

+-----------------------+------+---------+---------------+
|start_time             |action|category |attack_category|
+-----------------------+------+---------+---------------+
|1970-01-19 16:04:27.228|drop  |udp-flood|DoS            |
|1970-01-19 16:04:24.027|drop  |others   |DoS            |
+-----------------------+------+---------+---------------+

But I'm expecting timestamp with current year, eg 2021-01-19 16:04:27.228. In the elastic, start_time field has unixtime format in millis -> start_time": 1611314773.641

Jacfal
  • 15
  • 6
  • Try to read as string and cast it to timestamp in Spark. Add this in options : `OPTIONS('resource'='aggattack-2021.01', 'es.mapping.date.rich'= 'false')` then cast using : `from_unixtime(start_time) as start_time` – blackbishop Jan 23 '21 at 14:09
  • Hi, I tried it, but it leads to another exception => `The value (1609467228) of the type (java.lang.Long) cannot be converted to the string type` – Jacfal Jan 24 '21 at 08:13

1 Answers1

0

Problem was with the data in ElasticSearch. start_time field was mapped as epoch_seconds and contained value epoch seconds with three decimal places (eg 1611583978.684). Everything works fine after we have converted epoch time to millis without any decimal places

Jacfal
  • 15
  • 6