Pandas: map column using max() value of another colum

Question

I have a similar case like in my other question Pandas: map column using a dictionary on multiple columns but now, I want to use the max() value of column "category" not directly, but indirect for filling the None in the fourth column "category_name" the same case like in Question 1, but with an additional column with strings.

import pandas as pd
 f = {'company': ['Company1', 'Company1', 'Company1', 'Company1', 'Company2', 'Company2'],
         'product': ['Product A', 'Product A', 'Product F', 'Product A', 'Product F', 'Product F'],
         'category': ['1', 1, '3', '2', 3, '5'],
         'category_name': ['a', None, 'b', 'c', None, 'd']
         }

df = pd.DataFrame(f)

Here the column "category" is always filled and the column "category_name" has some missing values:

   company   product     category      category_name
0  Company1  Product A        1             a
1  Company1  Product A        1          None
2  Company1  Product F        3             b
3  Company1  Product A        2             c
4  Company2  Product F        3          None
5  Company2  Product F        5             d

Again I would like to fill then None/Nan with values and again the logic I like to use would be: use the column "category_name" of the row with the max value in column "category" as a combination of column 1. + 2.

The wished result would be:

   company   product     category      category_name
0  Company1  Product A        1             a
1  Company1  Product A        1           **c**
2  Company1  Product F        3             b
3  Company1  Product A        2             c
4  Company2  Product F        3           **d**
5  Company2  Product F        5             d

-> combination "company1" + "Product A" the max(category)=3 -> therefore use "c" for the missing value of line 1 in column "category name".

I would highly appreciate also help on this. Thank you very much

jezrael · Answer 1 · 2020-08-10T09:20:16.910

Use custom function with Series.idxmax for category_name by maximal category:

df['category'] = df['category'].astype(int)

def f(x):
    s = x.set_index('category_name')['category'].idxmax()
    x['category_name'] = x['category_name'].fillna(s)
    return x

df = df.groupby(['company','product']).apply(f)
print (df)
    company    product  category category_name
0  Company1  Product A         1             a
1  Company1  Product A         1             c
2  Company1  Product F         3             b
3  Company1  Product A         2             c
4  Company2  Product F         3             d
5  Company2  Product F         5             d

Mohd Kashif · Answer 2 · 2020-08-13T08:57:37.463

0

   `
def fnx(x):
    m=x["category"].max() 
    val=x[x["category"]==m][" 
    category_name"].values
    x["category_name"].fillna(val[0], in 
    place=True) 
    return x
df = df.groupby(['company','product']).apply(fnx)
`

edited Aug 13 '20 at 08:57

answered Aug 10 '20 at 08:09

Mohd Kashif

61
1
6

Thank you for your suggested solution. Unfortunately it gives me an error: "IndentationError: unexpected indent" – Eric Aug 13 '20 at 06:55
I've removed the spaces, but unfortunately it gives me another error: "ValueError: Length of passed values is 1, index implies 3." ->Traceback: File "", line 1, in File "...\venv\lib\site-packages\p andas\core\groupby\generic.py", line 489, in transform return self._transform_general( File "...\venv\lib\site-packages\p andas\core\groupby\generic.py", line 539, in _transform_general ser = klass(res, indexer) File "\venv\lib\site-packages\p andas\core\series.py", line 313, in __init__ raise ValueError( ValueError: Length of passed values is 1, index implies 3. – Eric Aug 13 '20 at 07:07
Hello! While this code may solve the question, [including an explanation](https://meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Brian61354270 Aug 13 '20 at 17:25

Pandas: map column using max() value of another colum

2 Answers2

Linked