Indicate Both Columns Had Same Value Using SortValues Functions

Question

I create a dataframe called applesdf to show columns named id, item, price and location. My goal is to find the max price per Item and then indicate where the larger pricer was.

For example:

applesdf = {'ID': [1,1],
        'Item': ['Apple','Apple'],
        'Price': [2,1],
        'Location':[1001,1002]
        }

df = pd.DataFrame(applesdf, columns = ['ID','Item','Price','Location'])

df

My dataframe:

ID	Item	Price	Location
1	Apple	2	1001
1	Apple	1	1002

So I run a sort_values function to group the apples together and find the max price.

My code:

applesmax = df.sort_values('Price').drop_duplicates(['ID', 'Item'], keep='last').sort_index().reset_index(drop=True)
applesmax

Result:

ID	Item	Price	Location
1	Apple	2	1001

The problem is when we have the same price.

Same price table:

ID	Item	Price	Location
1	Apple	2	1001
1	Apple	2	1002

The code would return the last record obviously as instructed. I was wondering if anybody had tips/documentation as to how I could instruct the program to return both locations to indicate that neither location had a larger price.

My expected output:

ID	Item	Price	Location
1	Apple	2	1001,1002

DPM · Answer 1 · 2021-08-22T00:41:33.967

1

So I created a small function that I think does what you seek. Bear with as it might not be optimised:

def max_price(dataframe, item):

    df_item = dataframe[dataframe['Item']==item] ## get dataframe that only has the item you want
    max_price = df_item['Price'].max()

    if len(df_item[df_item['Price']==max_price])>1: ## if there are more than one rows with the price equal to the max price
        
        return pd.DataFrame([[item, max_price, 'TIE']],  columns=['Item', 'Price', 'Location']) ## return the tie
    else:
        return df_apple[df_apple['Price']==df_apple['Price'].max()] ## else return where the max price happens

Here is an example:

df = pd.DataFrame(zip(['Apple', 'Apple', 'Banana'], [2,2,4],[1001, 1002, 1003]), 
                  columns = ['Item', 'Price', 'Location'])

max_price(df, 'Apple')

This way you also dont need to sort the original datframe.

edited Aug 22 '21 at 00:41

answered Aug 22 '21 at 00:35

DPM

845
7
33

This is definitely useful as a function but when looking at a large dataset with possibly thousands of items I don't believe it's optimal like you said. For example we may have many items (apples, bananas, oranges etc) in the dataset that have the same value but return the last location instead of a tie. I still appreciate the function and its usefulness however :) – i.d.s.chicago Aug 22 '21 at 02:01
Hello I updated my question to include both locations rather than some string value. – i.d.s.chicago Aug 22 '21 at 15:21

Indicate Both Columns Had Same Value Using SortValues Functions

1 Answers1

Linked