12

This was my try. For example

df = pd.DataFrame({'a':[5,0,1,np.nan], 'b':[np.nan,1,4,3], 'c':[-3,-2,0,0]})
df.dropna(axis=1).max(axis=1,key=abs)

Filters out well the NaN values but it gets 0 or negative values instead of the highes in absolute value

The result should be one column with

5
-2
4
3
Cœur
  • 37,241
  • 25
  • 195
  • 267
gis20
  • 1,024
  • 2
  • 15
  • 33

5 Answers5

14

I solved by

maxCol=lambda x: max(x.min(), x.max(), key=abs)
df.apply(maxCol,axis=1)
gis20
  • 1,024
  • 2
  • 15
  • 33
  • this solution works, but really slow... is there a faster solution? – Alon Gouldman May 26 '20 at 17:55
  • How could I alter this so that it takes the absolute minimum value? I tried to replace max with min so min(x.min(),x.max(),key=abs) but that did not work. – Andrew Hamel Aug 05 '20 at 14:24
  • @AlonGouldman my answer below should be more efficient if you're having performance issues. – Brendan Nov 11 '20 at 20:16
  • @AndrewHamel replace `max()` with `min()` in my answer below and it should work – Brendan Nov 11 '20 at 20:17
  • @Brendan I wanted (and also the OP) to keep the negative values. your way converts them into positive – Alon Gouldman Nov 12 '20 at 12:49
  • @AlonGouldman Good clarification! In that case I'd recommend something like `df.idxmax()` to get the indices of the maxima, then use those indices to select the original values in the original df. This approach should still outperform any apply operations. – Brendan Nov 12 '20 at 18:27
8

The most straightforward and efficient way is to convert to absolute values, and then find the max. Pandas supports this with straightforward syntax (abs and max) and does not require expensive apply operations:

df.abs().max()

max() accepts an axis argument, which can be used to specify whether to calculate the max on rows or columns.

Brendan
  • 1,905
  • 2
  • 19
  • 25
4

You can use np.nanargmax on the squared data:

>>> df.values[range(df.shape[0]),np.nanargmax(df**2,axis=1)]
array([ 5., -2.,  4.,  3.])
thomas
  • 1,773
  • 10
  • 14
1
df = df.fillna(0)
l = df.abs().values.argmax(axis=1)
pd.Series([df.values[i][l[i]] for i in range(len(df.values))])

In [532]: pd.Series([df.values[i][l[i]] for i in range(len(df.values))])
Out[532]:
0    5
1   -2
2    4
3    3
dtype: float64

One liner:

pd.Series([df.values[i][df.fillna(0).abs().values.argmax(axis=1)[i]] for i in range(len(df.values))])
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
-1

Due to my low reputation score I would like to add here to the answer of gis20 and the question of Andrew Hamel regarding the absolute minimum value:

minCol=lambda x: min(x, key=abs)
minCol=lambda x: min([abs(value) for value in x])  

works for my data, however, it cannot cope with np.nan's.

ConZZito
  • 325
  • 2
  • 5