Spark 2.2 extracting date not working from unix_timestamp

Question

In Spark 2.2 extracting date not working from unix_timestamp Input Data:

+-------------------------+
|UPDATE_TS                |
+-------------------------+
|26NOV2009:03:27:01.154410|
|24DEC2012:00:47:46.805710|
|02MAY2013:00:45:33.233844|
|21NOV2014:00:33:39.350140|
|10DEC2013:00:30:30.532446|

I tried following approaches but output Im getting as null

Query Tired:

Spark sql

sqlContext.sql("select from_unixtime(unix_timestamp(UPDATE_TS,'ddMMMyyyy:HH:MM:SS.ssssss'), 'yyyy') as new_date from df_vendor_tab").show()

DSL:

 df_id.withColumn('part_date', from_unixtime(unix_timestamp(df_id.UPDATE_TS, "ddMMMyyyy:HH:MM:SS.sss"), "yyyy"))

expected output:

You have the incorrect format string- you're using capital M for both month and minute. Month should be `M` while minute is `m`. — pault, Jun 29 '18 at 14:23

pault · Accepted Answer · 2018-06-29T15:01:18.437

2

You're using the incorrect format string. Capital M is for month. Lower case m is for minute.

The following would work:

from pyspark.sql.functions import from_unixtime, unix_timestamp, to_date

df_id.withColumn(
    'part_date', 
    from_unixtime(unix_timestamp(df_id.UPDATE_TS, "ddMMMyyyy:HH:mm:SS.SSSSSS"), "yyyy")
).show(truncate=False)
#+-------------------------+---------+
#|UPDATE_TS                |part_date|
#+-------------------------+---------+
#|26NOV2009:03:27:01.154410|2009     |
#|24DEC2012:00:47:46.805710|2012     |
#|02MAY2013:00:45:33.233844|2013     |
#|21NOV2014:00:33:39.350140|2014     |
#|10DEC2013:00:30:30.532446|2013     |
#+-------------------------+---------+

edited Jun 29 '18 at 15:01

answered Jun 29 '18 at 14:29

pault

41,343
15
107
149

I updated df.withColumn( 'part_date', from_unixtime(unix_timestamp(df['Event Read Time'], "yyyy-MM-dd HH:mm:SS.SSSSSS"), "yyyy") ).show(truncate=False), since I have my timestamps in this format 2017-11-11 10:27:26 AM. My output value is null. It is probably due to the fact that "yyyy-MM-dd HH:mm:SS.SSSSSS" is incorrect.What is the correct format? I will like to get the date not just the year? – Sade Oct 05 '18 at 12:31
This is my format 2017-11-11 10:27:26 AM my output should be 2017-11-11. – Sade Oct 05 '18 at 12:34
Thanks I fixed it . correct answer is df.withColumn( 'part_date', from_unixtime(unix_timestamp(df['Event Read Time'], "yyyy-MM-dd HH:mm:SS"), "yyyy-MM-dd") ).show(truncate=False) – Sade Oct 05 '18 at 12:37

score 0 · Answer 2 · answered Jun 29 '18 at 13:19

0

Simple spark-sql is working fine with unix_timestamp and from_unixtime

sqlContext.sql("Select from_unixtime(unix_timestamp('26NOV2009:03:27:01.154410', 'ddMMMyyyy'), 'yyyy')").show

Output:

+----+
| _c0|
+----+
|2009|
+----+

Since you are looking for extracting year, i have not considered hour, min, secs...

answered Jun 29 '18 at 13:19

Kannan Kandasamy

13,405
3
25
38

Strange its working with ddMMMyyyy but not with ddMMMyyyy:HH:MM:SS.sss – marjun Jun 29 '18 at 14:14

Spark 2.2 extracting date not working from unix_timestamp

2 Answers2