I have two timestamp columns('tpep_pickup_datetime' and 'tpep_dropoff_datetime') and when I calculate the difference between them, I get an interval variable.
yellowcab = yellowcab \
.withColumn('tpep_pickup_datetime', to_timestamp('tpep_pickup_datetime','yyyy-MM-dd HH:mm:ss'))\
.withColumn('tpep_dropoff_datetime', to_timestamp('tpep_dropoff_datetime','yyyy-MM-dd HH:mm:ss'))
yellowcab = yellowcab \
.withColumn('total_time', col('tpep_dropoff_datetime')-col('tpep_pickup_datetime'))
The result looks like that:
I want to transform 'total_time' column to an 'int' variable with the time converted to seconds.
I have tried to extract the hours and the minutes from the interval variable and then multiply them in order to convert to seconds, but I have not been able to do it