I got a dataframe like this:
gpi_data[['sig','hourtime']]
Out[28]:
sig hourtime
datetime_doy
2007-01-02 -8.963545 2007-01-02 09:20:11.249998
2007-01-03 -8.671357 2007-01-03 10:39:31.874991
2007-01-03 -8.996480 2007-01-03 20:22:59.999006
2007-01-04 -8.835958 2007-01-04 10:18:56.249024
2007-01-05 -8.785034 2007-01-05 21:21:39.374002
... ...
2019-12-30 -8.529724 2019-12-30 20:23:01.874996
2019-12-30 -8.563781 2019-12-30 20:48:28.125016
2019-12-30 -8.504211 2019-12-30 21:23:44.999996
2019-12-31 -8.460620 2019-12-31 09:39:31.873999
2019-12-31 -8.230092 2019-12-31 10:18:58.125014
[7983 rows x 2 columns]
and I want to calculate the averaged values of each morning and each afternoon based on hour time
. By morning I mean the data is observed around 10:00:00
, and 22:00:00
for afternoon. If there is no values on the morning/evening on this day, fill it with np.nan
. For example, on 2007-01-01
we don't have any morning or evening values of sig
. Then we fill it with two np.nan
values. Then on 2007-01-02
we only have morning value, so we fill the evening value of 2007-01-02
with np.nan.
SPECIFICALLY, for 2019-12-30
, we have 3 evening values which are 2019-12-30 20:23:01.874996
, 2019-12-30 20:48:28.125016
and 2019-12-30 21:23:44.999996
. So we need to calculate the average value of -8.529724
, -8.563781
and -8.504211
. It's same for the last two datapoints on the morning of 2019-12-31
, we need to average them, and fill the np.nan
to the evening of 2019-12-31
.
So ideally the final result would be:
gpi_data[['sig','hourtime']]
Out[28]:
sig hourtime
datetime_doy
2007-01-01 nan 2007-01-01 10:00:00
2007-01-01 nan 2007-01-01 22:00:00
2007-01-02 -8.963545 2007-01-02 09:20:11.249998
2007-01-02 nan 2007-01-02 22:00:00
2007-01-03 -8.671357 2007-01-03 10:39:31.874991
2007-01-03 -8.996480 2007-01-03 20:22:59.999006
2007-01-04 -8.835958 2007-01-04 10:18:56.249024
2007-01-04 nan 2007-01-04 22:00:00
2007-01-05 nan 2007-01-05 10:00:00
2007-01-05 -8.785034 2007-01-05 21:21:39.374002
... ...
2019-12-30 -8.532572 2019-12-30 22:00:00
2019-12-31 -8.345356 2019-12-31 09:39:31.873999
2019-12-31 nan 2019-12-31 22:00:00
It's fine if we round all hourtime
to 10:00:00 or 22:00:00 like below:
gpi_data[['sig','hourtime']]
Out[28]:
sig hourtime
datetime_doy
2007-01-01 nan 2007-01-01 10:00:00
2007-01-01 nan 2007-01-01 22:00:00
2007-01-02 -8.963545 2007-01-02 10:00:00
2007-01-02 nan 2007-01-02 22:00:00
2007-01-03 -8.671357 2007-01-03 10:00:00
2007-01-03 -8.996480 2007-01-03 22:00:00
2007-01-04 -8.835958 2007-01-04 10:00:00
2007-01-04 nan 2007-01-04 22:00:00
2007-01-05 nan 2007-01-05 10:00:00
2007-01-05 -8.785034 2007-01-05 22:00:00
... ...
2019-12-30 -8.532572 2019-12-30 22:00:00
2019-12-31 -8.460620 2019-12-31 10:00:00
2019-12-31 nan 2019-12-31 22:00:00
How can I do it? is there anybody who can help me? Thanks!