Pandas qcut apply on new data result in NaN

Question

I am binning for a modelling project and I ran into this problem. This example acquire bins using dataframe without 11, this result in a NaN when bins is applied to a new dataframe with 11 in it. Obviously this will happen, but I wonder if there is(there usually is) any clever method which can deal with this easily, such as some technique that make (7.75, 10.0] into (7.75, np.inf).


import pandas as pd
a,bin = pd.qcut(pd.DataFrame({"A":[1,2,3,4,5,6,7,8,9,10]}).A,retbins = True, q = 4)
pd.cut(pd.DataFrame({"A":[1,2,11]}).A, bins = bin ,include_lowest = True)


0    (0.999, 3.25]
1    (0.999, 3.25]
2              NaN
Name: A, dtype: category
Categories (4, interval[float64]): [(0.999, 3.25] < (3.25, 5.5] < (5.5, 7.75] < (7.75, 10.0]]

score 1 · Accepted Answer · answered May 14 '20 at 11:09

simply use np.inf instead of 10 when you create bins

a,bin = pd.qcut(pd.DataFrame({"A":[1,2,3,4,5,6,7,8,9,np.inf]}).A,retbins = True, q = 4)
pd.cut(pd.DataFrame({"A":[1,2,11]}).A, bins = bin ,include_lowest = True)

0    (0.999, 3.25]
1    (0.999, 3.25]
2      (7.75, inf]
Name: A, dtype: category
Categories (4, interval[float64]): [(0.999, 3.25] < (3.25, 5.5] < (5.5, 7.75] < (7.75, inf]]

Pandas qcut apply on new data result in NaN

1 Answers1