How to use KBinsDiscretizer to make continuous data into bins in Sklearn?

Question

I am working on a ML algorithm in which I tried to convert the continuous target values into small bins to understand the problem better. Hence to make better prediction. My original problem is for regression but I convert into classification by making small bins with labels.

I did as follow,

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(target) 
Xt = est.transform(s)

It shows a value error like below. Then I reshaped my data into 2D. yet I could not solve it.

ValueError: Expected 2D array, got 1D array instead:

from sklearn.preprocessing import KBinsDiscretizer

myData = pd.read_csv("train.csv", delimiter=",")
target = myData.iloc[:,-5]  # this is a continuous data which must be 
                        # converted into bins with a new column.

xx = target.values.reshape(21263,1)

est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(xx) 
Xt = est.transform(s)

You can see my target has 21263 rows. I have to divide these into 10 equal bins and write it into a a new column in my dataframe. Thanks for the guidance.

P.S.: Max target value:185.0
Min target value:0.00021

It is tough to replicate your example without knowing what data is inside "train.csv". You don't have to give the exact data, but it would be helpful to provide sample data for `myData` or `target`. edit: I see that the target value range has been provided, but the above is still a good general guideline to follow when posting questions. — rmutalik, Dec 15 '22 at 17:52

score 9 · Accepted Answer · answered Dec 28 '18 at 20:49

9

Okay I was able to solve it. In any case I post the answer if anyone else need this in the future. I used pandas.qcut

target['Temp_class'] = pd.qcut(target['Temeratue'], 10, labels=False)

This has solved my problem.

answered Dec 28 '18 at 20:49

Mass17

1,555
2
14
29

You're a genius! Thanks a lot. – 许传华 Jun 07 '21 at 02:59
How do you get back the original column from the binned column. ? – Aditya Vartak Jan 21 '22 at 10:34

score 7 · Answer 2 · answered Dec 29 '18 at 10:43

The mistake in your first attempt is you are giving the output of fit function into transform. .fit() returns the fitted model and not the input data. The correct way would be either of one of the below.

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
Xt = est.fit_transform(target)

or

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
est.fit(target)
Xt = est.transform(target)

score 6 · Answer 3 · answered Jan 29 '19 at 07:44

I was having a similar problem while working with the Titanic dataset. I found that one of my functions had converted my column into a float, and by changing it to an integer, that seemed to help the problem. Also, calling the specific column name with double square brackets worked for me:

from sklearn.preprocessing import KBinsDiscretizer
est = KBinsDiscretizer(n_bins=5, encode='onehot-dense', strategy='uniform')
new = est.fit_transform(dataset[['column_name']])

score 0 · Answer 4 · answered Dec 11 '22 at 20:43

What worked for me is to realize that the fit_transform needs a DataFrame as an input and not a Series. The error means that a series is a 1D object while the fit_transform needs a 2D object i.e. a DataFrame. So, for example, you can do:

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
Xt = est.fit_transform(pd.DataFrame(target))

How to use KBinsDiscretizer to make continuous data into bins in Sklearn?

4 Answers4