I have four columns with values: 'A', 'B', 'C' and 'D' and one column containing either of the four letters.
Letter A B C D max
12063289 D 5 9 0 39 0
12063290 D 3 25 0 79 0
12063291 B 0 232 0 0 0
12063292 A 351 0 0 0 0
12063293 D 0 0 0 394 0
Based on the value of the letter column, I'd like to compute the maximum value of the other three columns.
Letter A B C D max
12063289 D 5 9 0 39 9
12063290 D 3 25 0 79 25
12063291 B 0 232 0 16 16
12063292 A 351 0 200 0 200
12063293 D 0 0 0 394 0
Say the above data is stored in a variable df, I have tried to do the following:
import numpy as np
import pandas as pd
columns = {'A':['B','C','D'],
'B':['A','C','D'],
'C':['A','B','D'],
'D':['A','B','C']}
for letter in ['A', 'B', 'C', 'D']:
mask = df.loc[df['Letter']==letter]
np.max(df[mask][columns[letter]], out=df[mask]['max'])
I basically want the max function to work on only the relevant columns of the masked dataframe and write back to the 'max' column of the original dataframe (df) in exactly the right positions but pandas complains in the last line:
ValueError: Must pass DataFrame with boolean values only
The question is how do I target exactly those cells of those rows to receive the output of the max() function in order to not use unnecessary space (I can do this with the apply function but it takes up enormous amounts of space that I don't have).