I create a dataframe called applesdf to show columns named id, item, price and location. My goal is to find the max price per Item and then indicate where the larger pricer was.
For example:
applesdf = {'ID': [1,1],
'Item': ['Apple','Apple'],
'Price': [2,1],
'Location':[1001,1002]
}
df = pd.DataFrame(applesdf, columns = ['ID','Item','Price','Location'])
df
My dataframe:
ID | Item | Price | Location |
---|---|---|---|
1 | Apple | 2 | 1001 |
1 | Apple | 1 | 1002 |
So I run a sort_values function to group the apples together and find the max price.
My code:
applesmax = df.sort_values('Price').drop_duplicates(['ID', 'Item'], keep='last').sort_index().reset_index(drop=True)
applesmax
Result:
ID | Item | Price | Location |
---|---|---|---|
1 | Apple | 2 | 1001 |
The problem is when we have the same price.
Same price table:
ID | Item | Price | Location |
---|---|---|---|
1 | Apple | 2 | 1001 |
1 | Apple | 2 | 1002 |
The code would return the last record obviously as instructed. I was wondering if anybody had tips/documentation as to how I could instruct the program to return both locations to indicate that neither location had a larger price.
My expected output:
ID | Item | Price | Location |
---|---|---|---|
1 | Apple | 2 | 1001,1002 |