Using groupby()
in this case returns a data frame, not a column to be added to an existing data frame. That's why you were getting missing values in Max Date of Review
.
It also seems you don't really need both Date of Review
and Max Date of Review
in your final output.
Additionally, 'max'
is sufficient as an agg()
function, you don't need np.max
, nor do you need to wrap it in a list.
You basically already solved your own problem, minus a few syntax issues. It's easy enough to set straight.
First, here's some sample data:
dates = pd.date_range('20170101', periods=6, freq='D')
uid = ['FRANK1','JOHN2','FRANK1','JOHN2','FRANK1','FRANK1']
name = ['Frank','John','Frank','JohnABC','Frank','Frank123']
reviewer_map = pd.DataFrame({'UserID_NB':uid,
'Technical Director':name,
'Date of Review':dates})
print(reviewer_map)
Date of Review Technical Director UserID_NB
0 2017-01-01 Frank FRANK1
1 2017-01-02 John JOHN2
2 2017-01-03 Frank FRANK1
3 2017-01-04 JohnABC JOHN2
4 2017-01-05 Frank FRANK1
5 2017-01-06 Frank123 FRANK1
If you want to use groupby()
to accomplish your goal, this will work:
print(reviewer_map.groupby('UserID_NB', as_index=False)
.agg({'Date of Review': 'max'})
.rename(columns={'Date of Review':'Max Date of Review'}))
UserID_NB Max Date of Review
0 FRANK1 2017-01-06
1 JOHN2 2017-01-04
Note that the rename()
at the end isn't strictly necessary, it's only for if you actually want to rename the Date of Review
column as Max Date of Review
.
UPDATE
Per OP comments, here's a version that includes the Technical Director
field that matches the max
Date of Review
. Note that the SQL having
syntax can sometimes be mimicked with filter()
, but it's not always a straightforwardly analogous operation and often requires multiple groupby
operations.
Here I used merge()
instead, which matches the Technical Director
values from the original data frame with the max date rows of the groupby
output. Example data now includes different Technical Director
values per UserID_NB
, in order to illustrate this update.
df = (reviewer_map.groupby('UserID_NB', as_index=False)
.agg({'Date of Review': 'max'}))
df.merge(reviewer_map,
on=['UserID_NB','Date of Review'],
how='left')
Output:
UserID_NB Date of Review Technical Director
0 FRANK1 2017-01-06 Frank123
1 JOHN2 2017-01-04 JohnABC