Convert unix_timestamp column to string using pyspark

Question

I have a column which represents unix_timestamp and want to convert it into string with this format, 'yyyy-MM-dd HH:mm:ss.SSS'.

unix_timestamp | time_string

1578569683753  | 2020-01-09 11:34:43.753
1578569581793  | 2020-01-09 11:33:01.793
1578569581993  | 2020-01-09 11:33:01.993

Is there any builtin function or how does it work? Thanks.

score 0 · Answer 1 · answered Apr 04 '20 at 20:19

0

df1 = df1.withColumn('utc_stamp', F.from_unixtime('Timestamp', format="YYYY-MM-dd HH:mm:ss"))

df1.show(truncate=False)

from_unixtime converts only into seconds, for milliseconds I just have to concat them from original column to new column.

answered Apr 04 '20 at 20:19

Saketh Garuda

79
5

score 0 · Answer 2 · answered Jun 07 '20 at 19:40

unixtimestamp only supports second precision. Looking at your values the precision is at milliseconds, the last 3 positions are milliseconds.

from pyspark.sql.functions import substring,unix_timestamp,col,to_timestamp,concat,lit,from_unixtime

df = spark.createDataFrame([('1578569683753',), ('1578569581793',),('1578569581993',)], ['TMS'])
df.show(3,False)
df.printSchema()

Result

   +-------------+
    |TMS          |
    +-------------+
    |1578569683753|
    |1578569581793|
    |1578569581993|
    +-------------+

    root
     |-- TMS: string (nullable = true)

Convert to the human-readable timestamp format

df1 = (df
       .select("TMS"
               ,from_unixtime(substring(col("TMS"),1,10), format="yyyy-MM-dd HH:mm:sss").alias("TMS_WITHOUT_MILLISECONDS")
               ,(substring("TMS",11,3)).alias("MILLISECONDS")
               ,(concat(from_unixtime(substring(col("TMS"),1,10), format="yyyy-MM-dd HH:mm:sss"),lit('.'), substring(df.TMS,11,3))).alias("TMS_StringType")
               ,to_timestamp(concat(from_unixtime(substring(col("TMS"),1,10), format="yyyy-MM-dd HH:mm:sss"),lit('.'), substring(df.TMS,11,3))).alias("TMS_TimestampType")
              )
      )
df1.show(3,False)
df1.printSchema()

Output

+-------------+------------------------+------------+------------------------+-----------------------+
|TMS          |TMS_WITHOUT_MILLISECONDS|MILLISECONDS|TMS_StringType          |TMS_TimestampType      |
+-------------+------------------------+------------+------------------------+-----------------------+
|1578569683753|2020-01-09 11:34:043    |753         |2020-01-09 11:34:043.753|2020-01-09 11:34:43.753|
|1578569581793|2020-01-09 11:33:001    |793         |2020-01-09 11:33:001.793|2020-01-09 11:33:01.793|
|1578569581993|2020-01-09 11:33:001    |993         |2020-01-09 11:33:001.993|2020-01-09 11:33:01.993|
+-------------+------------------------+------------+------------------------+-----------------------+

root
 |-- TMS: string (nullable = true)
 |-- TMS_WITHOUT_MILLISECONDS: string (nullable = true)
 |-- MILLISECONDS: string (nullable = true)
 |-- TMS_StringType: string (nullable = true)
 |-- TMS_TimestampType: timestamp (nullable = true)

Convert unix_timestamp column to string using pyspark

2 Answers2