3

Need help in assigning a rank / variable based on the input value and where does that stand column values of percentiles Example:

If input value = Min column value --> Rank 1
input value between Min column value and P25 column value --> Rank 2
input value between P75 column value and Max column value --> Rank 5
input value = Max column value --> Rank 6

Here is sample data:

    date | value | Min  | P25  | P50  | P75  | Max       | output
    ---------------------------------------------------
    1-Sep| 45    | 12.0 | 28.2 | 48.9 | 85.4 | 98.0      | 3
    2-Sep| 63    | 12.0 | 28.2 | 48.9 | 85.4 | 98.0      | 4
    3-Sep| 87    | 12.0 | 28.2 | 48.9 | 85.4 | 98.0      | 5
    4-Sep| 12    | 12.0 | 28.1 | 48.9 | 85.2 | 98.0      | 1
    5-Sep| 89    | 14.2 | 28.8 | 48.9 | 85.8 | 98.0      | 5
    6-Sep| 98    | 14.2 | 28.8 | 48.9 | 85.8 | 98.0      | 6
    7-Sep| 41    | 14.2 | 28.8 | 48.9 | 85.6 | 97.9      | 3
    8-Sep| 22    | 14.2 | 28.8 | 48.9 | 85.6 | 97.9      | 2

Ranking dictionary (configuration) is like this: [Min:1, P25:2, P50:3, p75:4, Max:5, Max:6] (This can be changed if there is a better way to represent)

I have tried using sort values (while using apply function), but unable to figure out Min/Max conditions. This pandas df has 100k+ rows.

Thanks in advance.

Sharif
  • 194
  • 2
  • 12

1 Answers1

3

You can use np.select to do this:

cond1 = df['value'] <= df['Min']
cond2 = df['value'] <= df['P25']
cond3 = df['value'] <= df['P50']
cond4 = df['value'] <= df['P75']
cond5 = df['value'] < df['Max']

df['rank'] = np.select([cond1, cond2, cond3, cond4, cond5], [1,2,3,4,5], 6)

df

Output:

        date  value   Min   P25   P50   P75   Max  output  rank
1      1-Sep   45.0  12.0  28.2  48.9  85.4  98.0     3.0     3
2      2-Sep   63.0  12.0  28.2  48.9  85.4  98.0     4.0     4
3      3-Sep   87.0  12.0  28.2  48.9  85.4  98.0     5.0     5
4      4-Sep   12.0  12.0  28.1  48.9  85.2  98.0     1.0     1
5      5-Sep   89.0  14.2  28.8  48.9  85.8  98.0     5.0     5
6      6-Sep   98.0  14.2  28.8  48.9  85.8  98.0     6.0     6
7      7-Sep   41.0  14.2  28.8  48.9  85.6  97.9     3.0     3
8      8-Sep   22.0  14.2  28.8  48.9  85.6  97.9     2.0     2
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Thanks Scott Boston. Worked great. I have added condition and choices to a dictionary, used pd.eval along with np.select – Sharif Sep 20 '20 at 12:10
  • @Sharif I would like to see your solution, would you mind posting it. I always learn things from Stack Overflow. – Scott Boston Sep 20 '20 at 13:31
  • Here is the snippet of my code: `code levels = { "df['0.00'] == df[value]": 7, "(df['0.00'] < df[value]) & (df['0.68'] >= df[value])": 5, "(df['0.68'] < df[value]) & (df['0.99'] >= df[value])": 3, "df['1.00'] == df[value]": 1, } condlist = [] choicelist = [] for key, value in levels.items(): condlist.append(pd.eval(key)) choicelist.append(value) df[target_col] = np.select(condlist, choicelist) ` – Sharif Sep 23 '20 at 05:43