8
df1:

Timestamp:

1995-08-01T00:00:01.000+0000

Is there a way to separate the day of the month in the timestamp column of the data frame using pyspark. Not able to provide the code, I am new to spark. I do not have a clue on how to proceed.

data_person
  • 4,194
  • 7
  • 40
  • 75

2 Answers2

14

You can parse this timestamp using unix_timestamp:

from pyspark.sql import functions as F

format = "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
df2 = df1.withColumn('Timestamp2', F.unix_timestamp('Timestamp', format).cast('timestamp'))

Then, you can use dayofmonth in the new Timestamp column:

df2.select(F.dayofmonth('Timestamp2'))

More detials about these functions can be found in the pyspark functions documentation.

Daniel de Paula
  • 17,362
  • 9
  • 71
  • 72
  • 1
    `"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"` will work correctly, because `"yyyy-MM-dd'T'HH:mm:ss.SSSZ"` return null value. – Darkhan Feb 18 '21 at 11:58
0

Code:

df1.select(dayofmonth('Timestamp').alias('day'))
dur
  • 15,689
  • 25
  • 79
  • 125
data_person
  • 4,194
  • 7
  • 40
  • 75