How do I convert timestamp to unix format with pyspark

Question

I have a dataframe with timestamp values, like this one: 2018-02-15T11:39:13.000Z I want to have it in UNIX format, using Pyspark.

I tried something like data = datasample.withColumn('timestamp_cast', datasample['timestamp'].cast('date')) but I lose a lot of information, since I only get day/month/year when I have milliseconds information in my source.

Result: 2018-02-15

Any idea to get unix format and keep precision? Thank you!

You need to use [`pyspark.sql.functions.unixtimestamp`](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.from_unixtime). — pault, Nov 13 '18 at 16:22

score 15 · Accepted Answer · edited Nov 14 '18 at 15:40

15

You can use the built in unix_timestamp the following ways:

from pyspark.sql.functions import unix_timestamp
df = df.withColumn('unix', unix_timestamp('timestamp'))

Or

df = df.selectExpr('unix_timestamp(timestamp)')

edited Nov 14 '18 at 15:40

pault

41,343
15
107
149

answered Nov 13 '18 at 16:46

Tanjin

2,442
1
13
20

score 0 · Answer 2 · answered Sep 23 '22 at 09:32

0

Another possible method is to directly cast the column to integer

df.withColumn('timestamp_unix', F.col('timestamp').cast('int'))

answered Sep 23 '22 at 09:32

Ric S

9,073
3
25
51

How do I convert timestamp to unix format with pyspark

2 Answers2