I am looking for a pandas equivalent to the SQL Redshift window function LAST_VALUE().
I have a pandas dataframe of serial number reports that append daily.
import pandas as pd
data = {'serial_num': [123456, 678901, 123456, 678901],
'status': ['Good', 'Good', 'BAD', 'BAD'],
'last_check':['2020-03-02','2020-03-02','2020-03-01','2020-03-01']}
new_br = pd.DataFrame.from_dict(data)
new_br
serial_num status last_check
123456 Good 2020-03-02
678901 Good 2020-03-02
123456 BAD 2020-03-01
678901 BAD 2020-03-01
I want the max value of last_check
, grouped by serial_num, and preserve all columns (My actual dataset has more columns).
My code so far is:
new_br.set_index('last_check').groupby('serial_num').max()
serial_num status
123456 Good
678901 Good
However, this removes the last_check
column. How can I preserve the date column, similar to the LAST_VALUE() function in SQL Redshift?
My expected output is:
serial_num status last_check
123456 Good 2020-03-02
678901 Good 2020-03-02