Take the maximum in absolute value from different columns and filter out NaN Python

Question

This was my try. For example

df = pd.DataFrame({'a':[5,0,1,np.nan], 'b':[np.nan,1,4,3], 'c':[-3,-2,0,0]})
df.dropna(axis=1).max(axis=1,key=abs)

Filters out well the NaN values but it gets 0 or negative values instead of the highes in absolute value

The result should be one column with

when you do `dropna` you losing all columns with `NaN` values and only `c` column is left — Anton Protopopov, Dec 07 '15 at 10:29
Ok. At any case if i use df.max(axis=1,key=abs) it does not take the max in absolute value but just the max positive — gis20, Dec 07 '15 at 10:35

score 14 · Answer 1 · answered Dec 07 '15 at 11:01

14

I solved by

maxCol=lambda x: max(x.min(), x.max(), key=abs)
df.apply(maxCol,axis=1)

answered Dec 07 '15 at 11:01

gis20

1,024
2
15
33

this solution works, but really slow... is there a faster solution? – Alon Gouldman May 26 '20 at 17:55
How could I alter this so that it takes the absolute minimum value? I tried to replace max with min so min(x.min(),x.max(),key=abs) but that did not work. – Andrew Hamel Aug 05 '20 at 14:24
@AlonGouldman my answer below should be more efficient if you're having performance issues. – Brendan Nov 11 '20 at 20:16
@AndrewHamel replace `max()` with `min()` in my answer below and it should work – Brendan Nov 11 '20 at 20:17
@Brendan I wanted (and also the OP) to keep the negative values. your way converts them into positive – Alon Gouldman Nov 12 '20 at 12:49
@AlonGouldman Good clarification! In that case I'd recommend something like `df.idxmax()` to get the indices of the maxima, then use those indices to select the original values in the original df. This approach should still outperform any apply operations. – Brendan Nov 12 '20 at 18:27

score 8 · Answer 2 · answered Nov 11 '20 at 20:12

8

The most straightforward and efficient way is to convert to absolute values, and then find the max. Pandas supports this with straightforward syntax (abs and max) and does not require expensive apply operations:

df.abs().max()

max() accepts an axis argument, which can be used to specify whether to calculate the max on rows or columns.

answered Nov 11 '20 at 20:12

Brendan

1,905
2
19
25

3

This wouldn't answer the question asked because it removes the negative values. – DataMan Apr 14 '21 at 16:27

thomas · Answer 3 · 2015-12-07T11:12:47.787

4

You can use np.nanargmax on the squared data:

>>> df.values[range(df.shape[0]),np.nanargmax(df**2,axis=1)]
array([ 5., -2.,  4.,  3.])

edited Dec 07 '15 at 11:12

answered Dec 07 '15 at 10:32

thomas

1,773
10
14

Anton Protopopov · Answer 4 · 2015-12-07T11:06:52.993

1

df = df.fillna(0)
l = df.abs().values.argmax(axis=1)
pd.Series([df.values[i][l[i]] for i in range(len(df.values))])

In [532]: pd.Series([df.values[i][l[i]] for i in range(len(df.values))])
Out[532]:
0    5
1   -2
2    4
3    3
dtype: float64

One liner:

pd.Series([df.values[i][df.fillna(0).abs().values.argmax(axis=1)[i]] for i in range(len(df.values))])

edited Dec 07 '15 at 11:06

answered Dec 07 '15 at 10:57

Anton Protopopov

30,354
12
88
93

score -1 · Answer 5 · answered Aug 20 '20 at 10:01

Due to my low reputation score I would like to add here to the answer of gis20 and the question of Andrew Hamel regarding the absolute minimum value:

minCol=lambda x: min(x, key=abs)
minCol=lambda x: min([abs(value) for value in x])

works for my data, however, it cannot cope with np.nan's.

Take the maximum in absolute value from different columns and filter out NaN Python

5 Answers5