How to get that expected output, Need the difference between last and first value for every 30 minute.
I have the current dataset( it has 10 minute window): Parquet file
Sample input for one Item name:
item value type agentstatus originaltimestamp agenttimestamp
0 FCX.FCN2.PM01001_01_SUM.SUM 1793472 Double Good 2023-08-01T03:00:00.9650000Z 2023-08-01T03:00:00.0000000Z
25 FCX.FCN2.PM01001_01_SUM.SUM 1793473 Double Good 2023-08-01T03:10:00.1570000Z 2023-08-01T03:10:00.0000000Z
50 FCX.FCN2.PM01001_01_SUM.SUM 1793474 Double Good 2023-08-01T03:20:00.0740000Z 2023-08-01T03:20:00.0000000Z
75 FCX.FCN2.PM01001_01_SUM.SUM 1793475 Double Good 2023-08-01T03:30:00.5320000Z 2023-08-01T03:30:00.0000000Z
100 FCX.FCN2.PM01001_01_SUM.SUM 1793476 Double Good 2023-08-01T03:40:00.8700000Z 2023-08-01T03:40:00.0000000Z
125 FCX.FCN2.PM01001_01_SUM.SUM 1793477 Double Good 2023-08-01T03:50:00.5770000Z 2023-08-01T03:50:00.0000000Z
150 FCX.FCN2.PM01001_01_SUM.SUM 1793478 Double Good 2023-08-01T04:00:00.8650000Z 2023-08-01T04:00:00.0000000Z
175 FCX.FCN2.PM01001_01_SUM.SUM 1793479 Double Good 2023-08-01T04:10:00.7910000Z 2023-08-01T04:10:00.0000000Z
200 FCX.FCN2.PM01001_01_SUM.SUM 1793480 Double Good 2023-08-01T04:20:00.7640000Z 2023-08-01T04:20:00.0000000Z
225 FCX.FCN2.PM01001_01_SUM.SUM 1793481 Double Good 2023-08-01T04:30:00.5790000Z 2023-08-01T04:30:00.0000000Z
250 FCX.FCN2.PM01001_01_SUM.SUM 1793482 Double Good 2023-08-01T04:40:00.4700000Z 2023-08-01T04:40:00.0000000Z
275 FCX.FCN2.PM01001_01_SUM.SUM 1793483 Double Good 2023-08-01T04:50:00.7540000Z 2023-08-01T04:50:00.0000000Z
300 FCX.FCN2.PM01001_01_SUM.SUM 1793484 Double Good 2023-08-01T05:00:00.7750000Z 2023-08-01T05:00:00.0000000Z
325 FCX.FCN2.PM01001_01_SUM.SUM 1793485 Double Good 2023-08-01T05:10:00.2880000Z 2023-08-01T05:10:00.0000000Z
350 FCX.FCN2.PM01001_01_SUM.SUM 1793486 Double Good 2023-08-01T05:20:00.3390000Z 2023-08-01T05:20:00.0000000Z
375 FCX.FCN2.PM01001_01_SUM.SUM 1793487 Double Good 2023-08-01T05:30:00.4660000Z 2023-08-01T05:30:00.0000000Z
400 FCX.FCN2.PM01001_01_SUM.SUM 1793488 Double Good 2023-08-01T05:40:00.2750000Z 2023-08-01T05:40:00.0000000Z
425 FCX.FCN2.PM01001_01_SUM.SUM 1793489 Double Good 2023-08-01T05:50:00.8160000Z 2023-08-01T05:50:00.0000000Z
450 FCX.FCN2.PM01001_01_SUM.SUM 1793490 Double Good 2023-08-01T06:00:00.4000000Z 2023-08-01T06:00:00.0000000Z
475 FCX.FCN2.PM01001_01_SUM.SUM 1793491 Double Good 2023-08-01T06:10:00.9000000Z 2023-08-01T06:10:00.0000000Z
500 FCX.FCN2.PM01001_01_SUM.SUM 1793492 Double Good 2023-08-01T06:20:00.1900000Z 2023-08-01T06:20:00.0000000Z
Expected Output:
item value type agentstatus agenttimestamp
75 FCX.FCN2.PM01001_01_SUM.SUM 3 Double Good 2023-08-01T03:30:00.0000000Z
150 FCX.FCN2.PM01001_01_SUM.SUM 3 Double Good 2023-08-01T04:00:00.0000000Z
225 FCX.FCN2.PM01001_01_SUM.SUM 3 Double Good 2023-08-01T04:30:00.0000000Z
300 FCX.FCN2.PM01001_01_SUM.SUM 3 Double Good 2023-08-01T05:00:00.0000000Z
As you can see, The difference between last value and first value within 30 min widnow is taken and the result retains the values of the last value reference. Example: diff b/w:
75 FCX.FCN2.PM01001_01_SUM.SUM 1793475 Double Good 2023-08-01T03:30:00.5320000Z 2023-08-01T03:30:00.0000000Z
0 FCX.FCN2.PM01001_01_SUM.SUM 1793472 Double Good 2023-08-01T03:00:00.9650000Z 2023-08-01T03:00:00.0000000Z
Gives first row value (as seen from Expected Output section)
Actual current output:
index Item Value AgentStatus Type AgentTimeStamp Unit Calc1 Calc2
Item
FCX.FCN2.PM01001_01_SUM.SUM 25 FCX.FCN2.PM01001_01_SUM.SUM 0 Good Double 2023-08-01 00:10:00+00:00 kWh Diff 30min
FCX.FCN2.PM01001_01_SUM.SUM 50 FCX.FCN2.PM01001_01_SUM.SUM 0 Good Double 2023-08-01 00:20:00+00:00 kWh Diff 30min
FCX.FCN2.PM01001_01_SUM.SUM 75 FCX.FCN2.PM01001_01_SUM.SUM 0 Good Double 2023-08-01 00:30:00+00:00 kWh Diff 30min
FCX.FCN2.PM01001_01_SUM.SUM 100 FCX.FCN2.PM01001_01_SUM.SUM 0 Good Double 2023-08-01 00:40:00+00:00 kWh Diff 30min
FCX.FCN2.PM01001_01_SUM.SUM 125 FCX.FCN2.PM01001_01_SUM.SUM 0 Good Double 2023-08-01 00:50:00+00:00 kWh Diff 30min
The current output throws '0' value after the lambda function.
Code:
df= pd.read_parquet(stream, engine='pyarrow')
df = df.astype({'value':'int64'})
df.dtypes
df = df[['item', 'value', 'agentstatus', 'type', 'agenttimestamp']].reset_index()
df['agenttimestamp'] = pd.to_datetime(df['agenttimestamp'])
r = df.groupby('item').resample('30T', on='agenttimestamp', origin='start', label='right')
transformed_value = (r.last()
.assign(value=lambda d: d[['value']].sub(r[['value']].first()))
.reset_index('agenttimestamp')
)
#Adding new column values to the existing Dataframe
transformed_value['Unit'] = 'kWh'
transformed_value['Calc1'] = 'Diff'
transformed_value['Calc2'] = '30min'
transformed_value