2

I create a dataframe called applesdf to show columns named id, item, price and location. My goal is to find the max price per Item and then indicate where the larger pricer was.

For example:

applesdf = {'ID': [1,1],
        'Item': ['Apple','Apple'],
        'Price': [2,1],
        'Location':[1001,1002]
        }

df = pd.DataFrame(applesdf, columns = ['ID','Item','Price','Location'])

df

My dataframe:

ID Item Price Location
1 Apple 2 1001
1 Apple 1 1002

So I run a sort_values function to group the apples together and find the max price.

My code:

applesmax = df.sort_values('Price').drop_duplicates(['ID', 'Item'], keep='last').sort_index().reset_index(drop=True)
applesmax

Result:

ID Item Price Location
1 Apple 2 1001

The problem is when we have the same price.

Same price table:

ID Item Price Location
1 Apple 2 1001
1 Apple 2 1002

The code would return the last record obviously as instructed. I was wondering if anybody had tips/documentation as to how I could instruct the program to return both locations to indicate that neither location had a larger price.

My expected output:

ID Item Price Location
1 Apple 2 1001,1002

1 Answers1

1

So I created a small function that I think does what you seek. Bear with as it might not be optimised:

def max_price(dataframe, item):

    df_item = dataframe[dataframe['Item']==item] ## get dataframe that only has the item you want
    max_price = df_item['Price'].max()

    if len(df_item[df_item['Price']==max_price])>1: ## if there are more than one rows with the price equal to the max price
        
        return pd.DataFrame([[item, max_price, 'TIE']],  columns=['Item', 'Price', 'Location']) ## return the tie
    else:
        return df_apple[df_apple['Price']==df_apple['Price'].max()] ## else return where the max price happens

Here is an example:

df = pd.DataFrame(zip(['Apple', 'Apple', 'Banana'], [2,2,4],[1001, 1002, 1003]), 
                  columns = ['Item', 'Price', 'Location'])

max_price(df, 'Apple')

enter image description here

This way you also dont need to sort the original datframe.

DPM
  • 845
  • 7
  • 33
  • This is definitely useful as a function but when looking at a large dataset with possibly thousands of items I don't believe it's optimal like you said. For example we may have many items (apples, bananas, oranges etc) in the dataset that have the same value but return the last location instead of a tie. I still appreciate the function and its usefulness however :) – i.d.s.chicago Aug 22 '21 at 02:01
  • Hello I updated my question to include both locations rather than some string value. – i.d.s.chicago Aug 22 '21 at 15:21