How to apply Normalisation using the MinMaxScaler() to all Columns, but Exclude the Categorical?

Question

I am new to using the MinMaxScaler, so please do not bite my head of if this is a very, very simple question. Below, I have the following datatset:

sample_df.head(2)

ID     S_LENGTH     S_WIDTH     P_LENGTH     P_WIDTH     SPECIES
-------------------------------------------------------------------
1      3.5          2.5          5.6         1.7        VIRGINICA
2      4.5          5.6          3.4         8.7         SETOSA

Therefore, how to I apply normalisation to this dataset using the following code below to all my columns, excluding the ID and SPECIES columns?

I basically want to use the preprocessing.MinMaxScaler() to apply normalisation, so that all the features are in a range of 0 and 1.

This is the code I am using...

min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(sample_df)

...but when I execute it, I get this error:

ValueError: could not convert string to float: 'SETOSA'

Any help on how to accomplish what I want to do is much appreciated!

Also, my sincere apologies if this is a really dumb question, but I am new to this.

Thank you!

EDIT (SHOWING ERROR):

Alternatively, if I do this...

min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(sample_df[['S_LENGTH', 'S_WIDTH']])

sample_df.head(2)

...I get this error:

AttributeError: 'numpy.ndarray' object has no attribute 'sample'

sophocles · Accepted Answer · 2021-01-10T13:12:05.563

1

I doubt this will be very helpful but, you can get the numeric columns with:

num_df = df[[i for i in df.columns if df[i].dtypes != 'O']]

num_df
Out[126]: 
   ID  S_LENGTH  S_WIDTH  P_LENGTH  P_WIDTH
0   1       3.5      2.5       5.6      1.7
1   2       4.5      5.6       3.4      8.7

and then apply the MinMaxScaler on it:

min_max = preprocessing.MinMaxScaler()
min_max.fit_transform(num_df)

Out[129]:
array([[0., 0., 0., 1., 0.],
       [1., 1., 1., 0., 1.]])

EDIT: Using your df:

df
Out[162]: 
   ID  S_LENGTH  S_WIDTH  P_LENGTH  P_WIDTH    SPECIES
0   1       3.5      2.5       5.6      1.7  VIRGINICA
1   2       4.5      5.6       3.4      8.7     SETOSA

Use the following code:

num_df = min_max.fit_transform(pd.DataFrame((df[[i for i in df.columns if df[i].dtypes != 'O']])))
num_df.columns = [i for i in df.columns if df[i].dtypes != 'O']
cat_df = (df[[i for i in df.columns if df[i].dtypes == 'O']])
res = pd.merge(num_df,cat_df,left_index=True,right_index=True)

which will give you:

print(res)

    ID  S_LENGTH  S_WIDTH  P_LENGTH  P_WIDTH    SPECIES
0  0.0       0.0      0.0       1.0      0.0  VIRGINICA
1  1.0       1.0      1.0       0.0      1.0     SETOSA

Try line by line the code and let me know if this is what you need.

edited Jan 10 '21 at 13:12

answered Jan 10 '21 at 12:12

sophocles

13,593
3
14
33

Hi - thanks for the reply. But when I do that, and try and re-sample my data, I get this error: ```AttributeError: 'numpy.ndarray' object has no attribute 'sample'``` – Jan 10 '21 at 12:25
Can you please show me your code so that I can see where the error comes from? – sophocles Jan 10 '21 at 12:35
I have added the code where the error shows. – Jan 10 '21 at 12:39
I think this is because ```MinMaxScaler``` returns an array. Try changing your code to this: ```import pandas as pd```, ```sample_df = pd.DataFrame(min_max.fit_transform(sample_df[['S_LENGTH', 'S_WIDTH']]))```, ```sample_df.head(2)``` – sophocles Jan 10 '21 at 12:44
Thanks. I tried this, which prevented the error. But when re-sampling the data, I lose the column names and it only shows them two columns. – Jan 10 '21 at 13:00
Yes that makes sense. So let me understand what you're looking for. You would like to Normalise the numerical variables in a dataframe but also keep the categorical ones in the same dataframe right? If that, I will update my answer and help you get that. Just confirm please – sophocles Jan 10 '21 at 13:02
1

Correct. That is exactly what I want to do. Normalise the numerical columns, but retain all the columns in the Normalised dataframe. – Jan 10 '21 at 13:04

How to apply Normalisation using the MinMaxScaler() to all Columns, but Exclude the Categorical?

1 Answers1