-1

i have a pandas series like this:

                  0         1     2     3
1333270  2020-11-06  00:00:59  23.0  50.0
1333271  2020-11-06  00:01:59  23.0  50.0
1333272  2020-11-06  00:02:59  23.0  50.0
1333273  2020-11-06  00:03:59  23.0  50.0
1333274  2020-11-06  00:04:59  23.0  49.9
            ...       ...   ...   ...
1333951  2020-11-06  11:22:28  22.0  52.1
1333952  2020-11-06  11:23:28  21.9  52.1
1333953  2020-11-06  11:24:28  21.9  52.0
1333954  2020-11-06  11:25:28  21.9  52.1
1333955  2020-11-06  11:26:28  21.9  52.0

where the column with index 1 is the hour of the respective measures; what i would like to do, is for each hour sum the columns with i ndex 2 and 3 and do a mean of the value, in order to obtain a series with 24 rows and mean of the values for each hour.

I tryed to implement this code:

uniqueValues = df[0].unique()
uniqueValues = list(uniqueValues)
hours = ['00','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23']
simplelist=[]
for z, day in enumerate(uniqueValues):
    df_same_day = df.loc[df[0] == day]
    for hour in hours:
        if df_same_day[1].str[0:2].isin([hour]).any() == True:
            value1 = df_same_day.groupby(df_same_day[1].str[0:2])[2].sum()[hour]/df_same_day[1].str[0:2].value_counts()[hour]
            value2 = df_same_day.groupby(df_same_day[1].str[0:2])[3].sum()[hour]/df_same_day[1].str[0:2].value_counts()[hour]
            values = [df_same_day[0].values[0], hour+':00:00', value1, value2]
            simplelist.append(values)
        else:
            continue
    print('FINITO ORA: '+hour)
dfObj = pd.DataFrame(simplelist,columns=['Day', 'Timetag', 'Temp', 'Hum'])

it works, but it seems to me that is not the best solution to get the expected results. Has someone any ideas for how to improve and speed up the code?

Thank you

Andrea
  • 383
  • 1
  • 5
  • 19
  • Does this answer your question? [Python Pandas: Group datetime column into hour and minute aggregations](https://stackoverflow.com/questions/16266019/python-pandas-group-datetime-column-into-hour-and-minute-aggregations) – Serial Lazer Nov 09 '20 at 10:39
  • Just in part, the answer of @jezrael it's the better solution – Andrea Nov 09 '20 at 11:09

1 Answers1

3

Use DataFrame.resample with convert dates and times to datetimes, aggregate mean, remove only NaNs rows by DataFrame.dropna and rename columns - in output DataFrame is DatetimeIndex:

df['date'] = pd.to_datetime(df[0] + ' ' + df[1])
df = df.resample('h', on='date').mean().dropna(how='all').rename(columns={2:'Temp',3:'Hum'})
print (df)
                      Temp    Hum
date                             
2020-11-06 00:00:00  23.00  49.98
2020-11-06 11:00:00  21.92  52.06

Then if need dates and times to new columns use DatetimeIndex.date and DatetimeIndex.time:

df['Day'] = df.index.date
df['Timetag'] = df.index.time
df = df.reset_index()[['Day', 'Timetag', 'Temp', 'Hum']]
print (df)
          Day   Timetag   Temp    Hum
0  2020-11-06  00:00:00  23.00  49.98
1  2020-11-06  11:00:00  21.92  52.06
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252