I need to write Timestamp field to avro and ensure the data is saved in UTC. currently avro converts it to long (timestamp millis ) in the Local timezone of the server which is causing issues as if the server reading bk is a different timezone. I looked at the DataFrameWriter it seem to mention an option called timeZone but it doesnt seem to help.Is there a way to force Avro to consider all timestamp fields received in a specific timezone?
**CODE SNIPPET**
--write to spark avro
val data = Seq(Row("1",java.sql.Timestamp.valueOf("2020-05-11 15:17:57.188")))
val schemaOrig = List( StructField("rowkey",StringType,true)
,StructField("txn_ts",TimestampType,true))
val sourceDf = spark.createDataFrame(spark.sparkContext.parallelize(data),StructType(schemaOrig))
sourceDf.write.option("timeZone","UTC").avro("/test4")
--now try to read back from avro
spark.read.avro("/test4").show(false)
avroDf.show(false)
original value in soure 2020-05-11 15:17:57.188
in avro 1589224677188
read bk from avro wt out format
+-------------+-------------+
|rowkey |txn_ts |
+-------------+-------------+
|1 |1589224677188|
+-------------+-------------+
This is mapping fine but issue is if the local time of the server writing is EST and the one reading back is GMT it would give problem .
println(new java.sql.Timestamp(1589224677188L))
2020-05-11 7:17:57.188 -- time in GMT