3

I have a pandas dataframe below:

    df

    name    value    
0   Jack       3      
1   Luke       3      
2   Mark       2      
3   Chris      1      
4   Ace        10
5   Isaac      8

Based on the "value" column, I want to have the top 50% value to be marked as 1, bottom 50% value marked as 0.

Expecting to get result below:

    Results

    name    value      percent mark  
0   Jack       3            0
1   Luke       4            1
2   Mark       2            0
3   Chris      1            0
4   Ace        10           1
5   Isaac      8            1

Thx in advance.

jpp
  • 159,742
  • 34
  • 281
  • 339
SwagZ
  • 759
  • 1
  • 9
  • 16

2 Answers2

4

You can compare a series with its median and then convert bool to int:

df['percent_mark'] = (df['value'] > df['value'].median()).astype(int)

For a specific percentage, use pd.Series.quantile. For example:

df['percent_mark'] = (df['value'] > df['value'].quantile(0.25)).astype(int)
jpp
  • 159,742
  • 34
  • 281
  • 339
1

You can also use numpy which is maybe faster

import numpy as np
df['percent_mark_50'] = np.where(df.value > df.value.median(), 1, 0)

Or

df['percent_mark_25'] = np.where(df.value > np.percentile(df.value, 25), 1, 0)
J. Doe
  • 3,458
  • 2
  • 24
  • 42