1

Consider the following sample data:

import numpy as np
import pandas as pd

np.random.seed(123)
n = 200

df = pd.DataFrame({'v1': np.random.randint(0, 100, n),
                   'v2': np.random.randint(10, 90, n),
                   'v3': np.random.randint(10, 90, n),
                   'another_v': np.random.randint(10, 45, n)})

In this data, I would like to add a new column minval with the smallest value in ['v1', 'v2', 'v3'] and another one mincol with the column indicator of the column containing this minimal value.

I do this using the following code, which works fine.

df['minval'] = df[['v1', 'v2', 'v3']].min(axis='columns') # minimum value
df['mincol'] = df[['v1', 'v2', 'v3']].idxmin(axis = 'columns').str[1:]

However, how can I get the same for the second-smallest value?

S. below for expected output of a sample

    v1  v2  v3  another_v   2minval     2mincol
0   66  55  30  37          55          2
1   92  68  22  44          68          2
2   98  69  36  38          69          2
3   17  33  71  15          33          2
4   83  35  49  23          49          3
5   57  37  62  38          57          1
6   86  67  89  37          86          1
7   97  13  49  19          49          3
8   96  88  18  13          88          2
9   47  69  36  33          47          1

(Similar questions suggesting nlargest etc. only provide solutions for within-column comparisons.)

Ivo
  • 3,890
  • 5
  • 22
  • 53
  • it would be helpful if you shared a minimal dataframe with expected output. You could even add `np.random.seed(2)` to ensure the randomisation is constant – sammywemmy Dec 11 '20 at 11:31

2 Answers2

1

You can use for improve performance not loop apply solution in numpy:

np.random.seed(200)
n = 200

df = pd.DataFrame({'v1': np.random.randint(0, 100, n),
                   'v2': np.random.randint(10, 90, n),
                   'v3': np.random.randint(10, 90, n),
                   'another_v': np.random.randint(10, 45, n)})

df['minval'] = df[['v1', 'v2', 'v3']].min(axis='columns') # minimum value
df['mincol'] = df[['v1', 'v2', 'v3']].idxmin(axis = 'columns').str[1:]
print (df[:10])
   v1  v2  v3  another_v  minval mincol
0  26  31  65         35      26      1
1  16  52  54         12      16      1
2  68  14  22         10      14      2
3  42  73  67         26      42      1
4  55  87  21         36      21      3
5  76  40  16         26      16      3
6  79  37  31         19      31      3
7  89  46  78         13      46      2
8  14  64  29         17      14      1
9  91  85  73         31      73      3

Use numpy.argsort for positions of sorted values and then reorder columns names by this positions to a array with select last previous 'column':

v = ['v1', 'v2', 'v3']

arr = np.argsort(-df[v].to_numpy())
a = np.array(v)[arr]
print (a[:10])
[['v3' 'v2' 'v1']
 ['v3' 'v2' 'v1']
 ['v1' 'v3' 'v2']
 ['v2' 'v3' 'v1']
 ['v2' 'v1' 'v3']
 ['v1' 'v2' 'v3']
 ['v1' 'v2' 'v3']
 ['v1' 'v3' 'v2']
 ['v2' 'v3' 'v1']
 ['v1' 'v2' 'v3']]

Also you can reorder values - so last previous 'column' are second min values:

b = df[v].to_numpy()[np.arange(len(arr))[:,None], arr]
print (b[:10])
[[65 31 26]
 [54 52 16]
 [68 22 14]
 [73 67 42]
 [87 55 21]
 [76 40 16]
 [79 37 31]
 [89 78 46]
 [64 29 14]
 [91 85 73]]

Series is used for possible pass str[1:]

df['second_mincol'] = pd.Series(a[:, -2], index=df.index).str[1:]
df['second_minval'] = b[:, -2]

print (df)
     v1  v2  v3  another_v  minval mincol second_mincol  second_minval
0    26  31  65         35      26      1             2             31
1    16  52  54         12      16      1             2             52
2    68  14  22         10      14      2             3             22
3    42  73  67         26      42      1             3             67
4    55  87  21         36      21      3             1             55
..   ..  ..  ..        ...     ...    ...           ...            ...
195  63  81  39         12      39      3             1             63
196  85  38  31         39      31      3             2             38
197  63  17  21         40      17      2             3             21
198  24  20  31         10      20      2             1             24
199  95  55  16         26      16      3             2             55

[200 rows x 8 columns]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You could use the apply method (with axis=1 to look across the row), sort the values in descending order then return the second item (indexed to 1):

df['second_minval'] = df[["v1", "v2", "v3"]].apply(
    lambda x: x.sort_values(ascending=False)[1], axis=1
)
Phil Sheard
  • 2,102
  • 1
  • 17
  • 38