1

I have a dataframe with all numeric columns:

import pandas as pd
import numpy as np
np.random.seed(1001)
df = pd.DataFrame(np.random.randn(10, 2), columns=['A', 'B'])

I want to create common quantiles which includes all values of A and B. There are some missing values in both. Once the common quantiles are created I want to encode values in the dataframe to show labels according to what quantile the values fall within. I can do it column wise for every column but how can I do it over a dataframe?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Arc
  • 1,680
  • 6
  • 30
  • 57

1 Answers1

0

I think you can first stack DataFrame, use qcut and then unstack:

import pandas as pd
import numpy as np

np.random.seed(1001)
df = pd.DataFrame(np.random.randn(10, 2), columns=['A', 'B'])
df.ix[0,'A'] = np.nan
df.ix[2,'A'] = np.nan
df.ix[3,'B'] = np.nan
print (df)
          A         B
0       NaN -0.896065
1 -0.306299 -1.339934
2       NaN -0.641727
3  1.307946       NaN
4  0.829115 -0.023299
5 -0.208564 -0.916620
6 -1.074743 -0.086143
7  1.175839 -1.635092
8  1.228194  1.076386
9  0.394773 -0.387701

bins = np.linspace(-1, 1, 5)
print (pd.qcut(df.stack(), bins).unstack())
                  A                 B
0               NaN  (-1.635, -0.209]
1  (-1.635, -0.209]  [-1.34, -0.0861]
2               NaN  (-1.635, -0.209]
3   (-0.209, 1.308]               NaN
4   (-0.209, 1.308]   (-0.209, 1.308]
5  (-1.635, -0.209]  (-1.635, -0.209]
6  (-1.635, -0.209]   (-0.209, 1.308]
7   (-0.209, 1.308]               NaN
8   (-0.209, 1.308]   (-0.209, 1.308]
9   (-0.209, 1.308]  (-1.635, -0.209]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • My datapoints are a little different and I am getting an error:- Bin edges must be unique: array([ 0. , 0. , 0.66666667, 1.6 , 112. ]) Anyway I just needed to change the percentile values to get different bin edges. – Arc Jun 22 '16 at 07:10
  • I think one posible solution is [link](http://stackoverflow.com/a/36883735/2901002). – jezrael Jun 22 '16 at 07:18