How to manipulate values in a dataframe column at specific times for each day within the dataframe

Question

So, I have a datetime indexed dataframe that looks like this:

eventTime   Energy  Power   RunningHours
9/29/2018 0:00  146.985 65  2256.88
9/29/2018 1:00  147.05  64.5    2257.87
9/29/2018 2:00  147.116 65  2258.87
9/29/2018 3:00  147.181 65  2259.87
9/29/2018 4:00  147.246 65  2260.87
9/29/2018 5:00  147.312 65  2261.87
9/29/2018 5:11  76.428      
9/29/2018 5:12      65  
9/29/2018 6:00  147.377 65  2262.87
9/29/2018 7:00  147.443 65  2263.87
9/29/2018 8:00  147.45      2263.98
9/29/2018 9:17  76.558      
9/29/2018 9:17          1174.35
9/29/2018 19:00 147.502 65  2264.75
9/29/2018 20:00 147.567 65  2265.75
9/29/2018 21:00 147.633 65  2266.75
9/29/2018 22:00 147.698 65  2267.75
9/29/2018 23:00 147.764 65  2268.75
9/30/2018 0:00  147.829 65  2269.75
9/30/2018 1:00  147.895 65  2270.75
9/30/2018 2:00  147.961 65  2271.75
9/30/2018 3:00  148.026 65  2272.73
9/30/2018 4:00  148.092 65  2273.73
9/30/2018 5:00  148.157 65  2274.73
9/30/2018 6:00  148.223 65  2275.73
9/30/2018 7:00  148.288 65  2276.73
9/30/2018 8:00  148.297     2276.87
9/30/2018 13:51     64  
9/30/2018 19:00 148.35  65  2277.68
9/30/2018 20:00 148.415 65  2278.67
9/30/2018 21:00 148.481 65  2279.67
9/30/2018 22:00 148.546 65  2280.67
9/30/2018 23:00 148.611 65  2281.67

For each day in the datetime index, I am looking to find the difference between "RunningHours" value at 23 hours and 0 hours.

I am imagining my output to look like

9/29/2018   11.87
9/30/2018   11.92

How do I get to this. I am currently disaggregating the datetime index to date and time, then looping down date and time to find the difference. Seems complicated for something very simple and I am sure there is an easier way using the datetime index as is. I just don't know how. Help please.

@ansev Your code works very well for data that is continuous and where the information exists for 00: and 23:00 timestamps. However, if data is missing for these 2 timestamps, the script picks up the first available or the last available datapoint for this date.

For. e.g.: For the data below

6/7/2018 0:00   67.728  64  1037.82
6/7/2018 1:00   67.793  64  1038.82
6/7/2018 2:00   67.857  64  1039.82
6/7/2018 3:00   67.922  64  1040.82
6/7/2018 4:00   67.987  64  1041.82
6/7/2018 5:00           64  1042.82
6/7/2018 6:00               1043.43
6/7/2018 23:00  68.288

The output from the script is

6/7/2018    1037.82 1043.43 5.61

How do I modify it to say NaN if data is not available ? Thanks so much for your help on this.

`df.groupby('eventTime')['RunningHours'].agg(['first','last']).diff(axis=1).iloc[:,-1]`? — ansev, Mar 12 '20 at 23:32
@ansev: Thanks but this retained all original datetime indices and returned 0 or NaN but not the difference in the "RunningHours" column — pynoob, Mar 12 '20 at 23:43
Can you share the data in a format which is more convenient to use? — AMC, Mar 12 '20 at 23:56
@AMC: Sorry, I tried to but this was the result of my attempt. I am a SO noob as well as Python (first question here). How do I better share data ? Thanks for your help. — pynoob, Mar 13 '20 at 00:07

score 1 · Answer 1 · answered Mar 13 '20 at 00:00

1

assuming it is ordered chronologically we can use groupby.agg to get first and last for each date then we can get the difference

new_df = (df.groupby(pd.to_datetime(df['eventTime']).dt.date)['RunningHours']
            .agg(['first','last'])
            .assign(difference=lambda x: x['last']-x['first'])
            .reset_index())

print(new_df)
    eventTime    first     last  difference
0  2018-09-29  2256.88  2268.75       11.87
1  2018-09-30  2269.75  2281.67       11.92

answered Mar 13 '20 at 00:00

ansev

30,322
5
17
31

@ansevThis works perfectly when the data is continuous. Thank you very much !! However, for days when the specific timestamp is missing, it uses whatever the first or last datetimestamp available is. I want to look specifically for the difference between hour 0 and hour 23 (for creating a daily metric) for each date.. – pynoob Mar 13 '20 at 15:51
I am trying to post some data here but it is not showing up correctly in a usable format. So, let me edit the question above. I just executed this on a medium sized data. For 2018-10-01, there was only one datapoint in the df for timestamp 00:00. The code used this value for both 00:00 and 23:00. I want it to say NaN if the specific data at 00:00 and 23:00 are not available. – pynoob Mar 13 '20 at 16:05

score 0 · Accepted Answer · answered Mar 18 '20 at 19:59

0

Find values from a column in a DF at very specific times for every unique date

I answered my own question here for those that are looking for something different.

answered Mar 18 '20 at 19:59

pynoob

41
5

How to manipulate values in a dataframe column at specific times for each day within the dataframe

2 Answers2