Good day, I have this data set of customers with their region and Date of Birth (dob). Some customers have more than a region assigned, so I need to filter them out and keep only last region on pandas (using max value of string would be good). I tried to filter using group-by but I only manage to display customer and region, and I need to display the whole row (including dob)
Thanks in advance.
Data:
import numpy as np
import pandas as pd
data = [['A', 'FL', '2000-06-01'], ['B', 'FL', '1999-05-01'], ['C', 'FL', '2000-02-03'], ['C', 'NY', '2000-02-03'], ['D', 'FL', '1990-03-01'], ['E', 'NY', '1999-12-07'], ['E', 'TX', '1999-12-07'], ['F', 'FL', '1995-06-07'], ['F', 'TX', '1995-06-07'],]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['customer', 'region', 'dob'])
customer region dob
0 A FL 2000-06-01
1 B FL 1999-05-01
2 C FL 2000-02-03
3 C NY 2000-02-03
4 D FL 1990-03-01
5 E NY 1999-12-07
6 E TX 1999-12-07
7 F FL 1995-06-07
8 F TX 1995-06-07
My take:
df.groupby(['customer'], sort=False)['region'].max()
Partial Output
customer
A FL
B FL
C NY
D FL
E TX
F TX
Desired output:
customer region dob
0 A FL 2000-06-01
1 B FL 1999-05-01
2 C NY 2000-02-03
3 D FL 1990-03-01
4 E TX 1999-12-07
5 F TX 1995-06-07