-1

How to get that expected output, Need the difference between last and first value for every 30 minute.

I have the current dataset( it has 10 minute window): Parquet file

Sample input for one Item name:

                        item    value   type    agentstatus originaltimestamp   agenttimestamp      
0   FCX.FCN2.PM01001_01_SUM.SUM 1793472 Double  Good    2023-08-01T03:00:00.9650000Z    2023-08-01T03:00:00.0000000Z        
25  FCX.FCN2.PM01001_01_SUM.SUM 1793473 Double  Good    2023-08-01T03:10:00.1570000Z    2023-08-01T03:10:00.0000000Z        
50  FCX.FCN2.PM01001_01_SUM.SUM 1793474 Double  Good    2023-08-01T03:20:00.0740000Z    2023-08-01T03:20:00.0000000Z        
75  FCX.FCN2.PM01001_01_SUM.SUM 1793475 Double  Good    2023-08-01T03:30:00.5320000Z    2023-08-01T03:30:00.0000000Z        
100 FCX.FCN2.PM01001_01_SUM.SUM 1793476 Double  Good    2023-08-01T03:40:00.8700000Z    2023-08-01T03:40:00.0000000Z        
125 FCX.FCN2.PM01001_01_SUM.SUM 1793477 Double  Good    2023-08-01T03:50:00.5770000Z    2023-08-01T03:50:00.0000000Z        
150 FCX.FCN2.PM01001_01_SUM.SUM 1793478 Double  Good    2023-08-01T04:00:00.8650000Z    2023-08-01T04:00:00.0000000Z        
175 FCX.FCN2.PM01001_01_SUM.SUM 1793479 Double  Good    2023-08-01T04:10:00.7910000Z    2023-08-01T04:10:00.0000000Z        
200 FCX.FCN2.PM01001_01_SUM.SUM 1793480 Double  Good    2023-08-01T04:20:00.7640000Z    2023-08-01T04:20:00.0000000Z        
225 FCX.FCN2.PM01001_01_SUM.SUM 1793481 Double  Good    2023-08-01T04:30:00.5790000Z    2023-08-01T04:30:00.0000000Z        
250 FCX.FCN2.PM01001_01_SUM.SUM 1793482 Double  Good    2023-08-01T04:40:00.4700000Z    2023-08-01T04:40:00.0000000Z        
275 FCX.FCN2.PM01001_01_SUM.SUM 1793483 Double  Good    2023-08-01T04:50:00.7540000Z    2023-08-01T04:50:00.0000000Z        
300 FCX.FCN2.PM01001_01_SUM.SUM 1793484 Double  Good    2023-08-01T05:00:00.7750000Z    2023-08-01T05:00:00.0000000Z        
325 FCX.FCN2.PM01001_01_SUM.SUM 1793485 Double  Good    2023-08-01T05:10:00.2880000Z    2023-08-01T05:10:00.0000000Z        
350 FCX.FCN2.PM01001_01_SUM.SUM 1793486 Double  Good    2023-08-01T05:20:00.3390000Z    2023-08-01T05:20:00.0000000Z        
375 FCX.FCN2.PM01001_01_SUM.SUM 1793487 Double  Good    2023-08-01T05:30:00.4660000Z    2023-08-01T05:30:00.0000000Z        
400 FCX.FCN2.PM01001_01_SUM.SUM 1793488 Double  Good    2023-08-01T05:40:00.2750000Z    2023-08-01T05:40:00.0000000Z        
425 FCX.FCN2.PM01001_01_SUM.SUM 1793489 Double  Good    2023-08-01T05:50:00.8160000Z    2023-08-01T05:50:00.0000000Z        
450 FCX.FCN2.PM01001_01_SUM.SUM 1793490 Double  Good    2023-08-01T06:00:00.4000000Z    2023-08-01T06:00:00.0000000Z        
475 FCX.FCN2.PM01001_01_SUM.SUM 1793491 Double  Good    2023-08-01T06:10:00.9000000Z    2023-08-01T06:10:00.0000000Z        
500 FCX.FCN2.PM01001_01_SUM.SUM 1793492 Double  Good    2023-08-01T06:20:00.1900000Z    2023-08-01T06:20:00.0000000Z

Expected Output:

                    item    value   type    agentstatus agenttimestamp
75  FCX.FCN2.PM01001_01_SUM.SUM 3   Double  Good    2023-08-01T03:30:00.0000000Z
150 FCX.FCN2.PM01001_01_SUM.SUM 3   Double  Good    2023-08-01T04:00:00.0000000Z
225 FCX.FCN2.PM01001_01_SUM.SUM 3   Double  Good    2023-08-01T04:30:00.0000000Z
300 FCX.FCN2.PM01001_01_SUM.SUM 3   Double  Good    2023-08-01T05:00:00.0000000Z

As you can see, The difference between last value and first value within 30 min widnow is taken and the result retains the values of the last value reference. Example: diff b/w:

75  FCX.FCN2.PM01001_01_SUM.SUM 1793475 Double  Good    2023-08-01T03:30:00.5320000Z    2023-08-01T03:30:00.0000000Z

0   FCX.FCN2.PM01001_01_SUM.SUM 1793472 Double  Good    2023-08-01T03:00:00.9650000Z    2023-08-01T03:00:00.0000000Z

Gives first row value (as seen from Expected Output section)

Actual current output:

                                       index    Item    Value   AgentStatus Type    AgentTimeStamp  Unit    Calc1   Calc2
Item                                    
FCX.FCN2.PM01001_01_SUM.SUM 25  FCX.FCN2.PM01001_01_SUM.SUM 0   Good    Double  2023-08-01 00:10:00+00:00   kWh Diff    30min
FCX.FCN2.PM01001_01_SUM.SUM 50  FCX.FCN2.PM01001_01_SUM.SUM 0   Good    Double  2023-08-01 00:20:00+00:00   kWh Diff    30min
FCX.FCN2.PM01001_01_SUM.SUM 75  FCX.FCN2.PM01001_01_SUM.SUM 0   Good    Double  2023-08-01 00:30:00+00:00   kWh Diff    30min
FCX.FCN2.PM01001_01_SUM.SUM 100 FCX.FCN2.PM01001_01_SUM.SUM 0   Good    Double  2023-08-01 00:40:00+00:00   kWh Diff    30min
FCX.FCN2.PM01001_01_SUM.SUM 125 FCX.FCN2.PM01001_01_SUM.SUM 0   Good    Double  2023-08-01 00:50:00+00:00   kWh Diff    30min

The current output throws '0' value after the lambda function.

Code:

df= pd.read_parquet(stream, engine='pyarrow')
df = df.astype({'value':'int64'})
df.dtypes

df  = df[['item', 'value', 'agentstatus', 'type', 'agenttimestamp']].reset_index()

df['agenttimestamp'] = pd.to_datetime(df['agenttimestamp'])
r = df.groupby('item').resample('30T', on='agenttimestamp', origin='start', label='right')


                                                 
transformed_value = (r.last()
        .assign(value=lambda d: d[['value']].sub(r[['value']].first()))
        .reset_index('agenttimestamp')
     )


#Adding new column values to the existing Dataframe
transformed_value['Unit'] = 'kWh'
transformed_value['Calc1'] = 'Diff'
transformed_value['Calc2'] = '30min'
transformed_value
Nishad Nazar
  • 371
  • 2
  • 3
  • 16
  • Hi Stack Community, this can be a follow-up to this thread https://stackoverflow.com/questions/76803053/how-to-take-difference-between-last-value-and-first-value-of-30-min-window-and?rq=2. But different dataset and different issue. Please help me out – Nishad Nazar Aug 09 '23 at 06:47
  • I don't think rolling function will work here, as I need to retain the values of other column after taking the difference, Please anyone any suggestion ? – Nishad Nazar Aug 09 '23 at 09:27

0 Answers0