0

I have dataframe which contain time column which is in string format.

dataframe=

      time         value
      00:00:00      10
      00:23:00      5
      00:59:00      23 
      01:23:34      34
      01:56:00      34

Every time i try to group by hours on Time column it give output like below this :-

 hour    count
  0       38
  1       68

But I want Out put like this..

 hour    count
  00       38
  01       68

For this i wrote the query like below ;-

    dataframe.groupBy(hour('time')).agg({'value':'count'})
vishwajeet Mane
  • 344
  • 1
  • 3
  • 13

1 Answers1

0

Quoting substring multiple characters from the last index of a pyspark string column using negative indexing

Since your time column is in StringType, we can use substring to get the hour as you want, and group on it as StringType

from pyspark.sql.functions import substring, col

df = df.withColumn("hour", substring(F.col("time"), 0, 2))
group_df = df.groupby("hour").sum("value") # or whichever aggregation you want
pissall
  • 7,109
  • 2
  • 25
  • 45