Alternative to get difference of timestamp in milliseconds -
Load the test data
val data =
"""
|writingTime,time
| 2020-06-25T13:29:33.415Z,2020-06-25T13:29:33.190Z
""".stripMargin
val stringDS1 = data.split(System.lineSeparator())
.map(_.split("\\,").map(_.replaceAll("""^[ \t]+|[ \t]+$""", "")).mkString(","))
.toSeq.toDS()
val df1 = spark.read
.option("sep", ",")
.option("inferSchema", "true")
.option("header", "true")
.option("nullValue", "null")
.csv(stringDS1)
df1.show(false)
df1.printSchema()
/**
* +-----------------------+----------------------+
* |writingTime |time |
* +-----------------------+----------------------+
* |2020-06-25 18:59:33.415|2020-06-25 18:59:33.19|
* +-----------------------+----------------------+
*
* root
* |-- writingTime: timestamp (nullable = true)
* |-- time: timestamp (nullable = true)
*/
find difference in milliseconds
please note that the columns are already of type timestamp, so no need of casting to timestamp
val millis = udf((start: java.sql.Timestamp, end: java.sql.Timestamp) => end.getTime - start.getTime)
df1.withColumn("date_diff_millis", millis($"time", $"writingTime"))
.show(false)
/**
* +-----------------------+----------------------+----------------+
* |writingTime |time |date_diff_millis|
* +-----------------------+----------------------+----------------+
* |2020-06-25 18:59:33.415|2020-06-25 18:59:33.19|225 |
* +-----------------------+----------------------+----------------+
*/
when you cast the timestamp to long, it converts it into seconds (unix epoch)
Observe that the difference is zero
df1.withColumn("date_diff_millis", $"time".cast("long") - $"writingTime".cast("long"))
.show(false)
/**
* +-----------------------+----------------------+----------------+
* |writingTime |time |date_diff_millis|
* +-----------------------+----------------------+----------------+
* |2020-06-25 18:59:33.415|2020-06-25 18:59:33.19|0 |
* +-----------------------+----------------------+----------------+
*/