i have a pandas series like this:
0 1 2 3
1333270 2020-11-06 00:00:59 23.0 50.0
1333271 2020-11-06 00:01:59 23.0 50.0
1333272 2020-11-06 00:02:59 23.0 50.0
1333273 2020-11-06 00:03:59 23.0 50.0
1333274 2020-11-06 00:04:59 23.0 49.9
... ... ... ...
1333951 2020-11-06 11:22:28 22.0 52.1
1333952 2020-11-06 11:23:28 21.9 52.1
1333953 2020-11-06 11:24:28 21.9 52.0
1333954 2020-11-06 11:25:28 21.9 52.1
1333955 2020-11-06 11:26:28 21.9 52.0
where the column with index 1 is the hour of the respective measures; what i would like to do, is for each hour sum the columns with i ndex 2 and 3 and do a mean of the value, in order to obtain a series with 24 rows and mean of the values for each hour.
I tryed to implement this code:
uniqueValues = df[0].unique()
uniqueValues = list(uniqueValues)
hours = ['00','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23']
simplelist=[]
for z, day in enumerate(uniqueValues):
df_same_day = df.loc[df[0] == day]
for hour in hours:
if df_same_day[1].str[0:2].isin([hour]).any() == True:
value1 = df_same_day.groupby(df_same_day[1].str[0:2])[2].sum()[hour]/df_same_day[1].str[0:2].value_counts()[hour]
value2 = df_same_day.groupby(df_same_day[1].str[0:2])[3].sum()[hour]/df_same_day[1].str[0:2].value_counts()[hour]
values = [df_same_day[0].values[0], hour+':00:00', value1, value2]
simplelist.append(values)
else:
continue
print('FINITO ORA: '+hour)
dfObj = pd.DataFrame(simplelist,columns=['Day', 'Timetag', 'Temp', 'Hum'])
it works, but it seems to me that is not the best solution to get the expected results. Has someone any ideas for how to improve and speed up the code?
Thank you