7

I have a dataset like the below:

epoch_seconds eq_time
1636663343887 2021-11-12 02:12:23

Now, I am trying to convert the eq_time to epoch seconds which should match the value of the first column but am unable to do so. Below is my code:

df = spark.sql("select '1636663343887' as epoch_seconds")
df1 = df.withColumn("eq_time", from_unixtime(col("epoch_seconds") / 1000))

df2 = df1.withColumn("epoch_sec", unix_timestamp(df1.eq_time))
df2.show(truncate=False)   

I am getting output like below:

epoch_seconds eq_time epoch_sec
1636663343887 2021-11-12 02:12:23 1636663343

I tried this link as well but didn't help. My expected output is that the first and third columns should match each other.

P.S: I am using the Spark 3.1.1 version on local whereas it is Spark 2.4.3 in production, and my end goal would be to run it in production.

blackbishop
  • 30,945
  • 11
  • 55
  • 76
whatsinthename
  • 1,828
  • 20
  • 59
  • So, yeah. How do I resolve this? Suppose I have a timestamp like `2021-11-12 00:00:00.000` so what would be its equivalent epoch milliseconds? – whatsinthename Nov 13 '21 at 19:14
  • `t = datetime.datetime.strptime('2021-11-12 02:12:23.887', '%Y-%m-%d %H:%M:%S.%f') ` `print(t.strftime('%s%f')[:-3]) ` I would do it this way – xBatmanx Nov 13 '21 at 19:22
  • you can't hardcode milliseconds value in date..it should be like derived from epoch and again derived back to epoch from date – whatsinthename Nov 13 '21 at 20:17
  • I tried converting your value of epoch_seconds to date and time using an online tool: https://www.epochconverter.com/ and found out that the last 3 digits , i.e. the milliseconds does not affect the epoch value, Please try the tool out and let me know! – xBatmanx Nov 13 '21 at 20:26

3 Answers3

6

Use to_timestamp instead of from_unixtime to preserve the milliseconds part when you convert epoch to spark timestamp type.

Then, to go back to timestamp in milliseconds, you can use unix_timestamp function or by casting to long type, and concatenate the result with the fraction of seconds part of the timestamp that you get with date_format using pattern S:

import pyspark.sql.functions as F

df = spark.sql("select '1636663343887' as epoch_ms")

df2 = df.withColumn(
    "eq_time",
    F.to_timestamp(F.col("epoch_ms") / 1000)
).withColumn(
    "epoch_milli",
    F.concat(F.unix_timestamp("eq_time"), F.date_format("eq_time", "S"))
)

df2.show(truncate=False)
#+-------------+-----------------------+-------------+
#|epoch_ms     |eq_time                |epoch_milli  |
#+-------------+-----------------------+-------------+
#|1636663343887|2021-11-11 21:42:23.887|1636663343887|
#+-------------+-----------------------+-------------+ 
blackbishop
  • 30,945
  • 11
  • 55
  • 76
3

I prefer to do the timestamp conversion with only using cast.

from pyspark.sql.functions import col

df = spark.sql("select '1636663343887' as epoch_seconds")
df = df.withColumn("eq_time", (col("epoch_seconds") / 1000).cast("timestamp"))
df = df.withColumn("epoch_sec", (col("eq_time").cast("double") * 1000).cast("long"))

df.show(truncate=False)

If you do in this way, you need to think in seconds, than it will work perfectly.

Bibzon
  • 216
  • 2
  • 8
0

To convert between time formats in Python, the datetime.datetime.strptime() and .strftime() are useful.

To read the string from eq_time and process into a Python datetime object:

import datetime
t = datetime.datetime.strptime('2021-11-12 02:12:23', '%Y-%m-%d %H:%M:%S')

To print t in epoch_seconds format:

print(t.strftime('%s')

Pandas has date processing functions which work along similar lines: Applying strptime function to pandas series

You could run this on the eq_time column, immediately after extracting the data, to ensure your DataFrame contains the date in the correct format

moo
  • 1,597
  • 1
  • 14
  • 29