9

I am working on a ML algorithm in which I tried to convert the continuous target values into small bins to understand the problem better. Hence to make better prediction. My original problem is for regression but I convert into classification by making small bins with labels.

I did as follow,

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(target) 
Xt = est.transform(s)

It shows a value error like below. Then I reshaped my data into 2D. yet I could not solve it.

ValueError: Expected 2D array, got 1D array instead:

from sklearn.preprocessing import KBinsDiscretizer

myData = pd.read_csv("train.csv", delimiter=",")
target = myData.iloc[:,-5]  # this is a continuous data which must be 
                        # converted into bins with a new column.

xx = target.values.reshape(21263,1)

est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(xx) 
Xt = est.transform(s)

You can see my target has 21263 rows. I have to divide these into 10 equal bins and write it into a a new column in my dataframe. Thanks for the guidance.

P.S.: Max target value:185.0
Min target value:0.00021

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
Mass17
  • 1,555
  • 2
  • 14
  • 29
  • It is tough to replicate your example without knowing what data is inside "train.csv". You don't have to give the exact data, but it would be helpful to provide sample data for `myData` or `target`. edit: I see that the target value range has been provided, but the above is still a good general guideline to follow when posting questions. – rmutalik Dec 15 '22 at 17:52

4 Answers4

9

Okay I was able to solve it. In any case I post the answer if anyone else need this in the future. I used pandas.qcut

target['Temp_class'] = pd.qcut(target['Temeratue'], 10, labels=False)

This has solved my problem.

Mass17
  • 1,555
  • 2
  • 14
  • 29
7

The mistake in your first attempt is you are giving the output of fit function into transform. .fit() returns the fitted model and not the input data. The correct way would be either of one of the below.

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
Xt = est.fit_transform(target) 

or

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
est.fit(target)
Xt = est.transform(target)
Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
6

I was having a similar problem while working with the Titanic dataset. I found that one of my functions had converted my column into a float, and by changing it to an integer, that seemed to help the problem. Also, calling the specific column name with double square brackets worked for me:

from sklearn.preprocessing import KBinsDiscretizer
est = KBinsDiscretizer(n_bins=5, encode='onehot-dense', strategy='uniform')
new = est.fit_transform(dataset[['column_name']])
Doug
  • 143
  • 10
0

What worked for me is to realize that the fit_transform needs a DataFrame as an input and not a Series. The error means that a series is a 1D object while the fit_transform needs a 2D object i.e. a DataFrame. So, for example, you can do:

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
Xt = est.fit_transform(pd.DataFrame(target))
G PN
  • 1