Why is the following python code related to sklearn preprocessing not working and how can I debug it?

Question

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of missing data
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

# Encoding categorical data
# Encoding the Independent Variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
# Encoding the Dependent Variable
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

It shows an error

runfile('D:/Python Programs/Machine Learning A-Z New/Part 1 - Data Preprocessing/Section 2 -------------------- Part 1 - Data Preprocessing --------------------/categorical_data.py', wdir='D:/Python Programs/Machine Learning A-Z New/Part 1 - Data Preprocessing/Section 2 -------------------- Part 1 - Data Preprocessing --------------------') C:\Users\KIIT\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py:415: FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly. warnings.warn(msg, FutureWarning) C:\Users\KIIT\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py:451: DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead. "use the ColumnTransformer instead.", DeprecationWarning)

remove the `axis = 0` parameter from the `SimpleImputer(missing_values = 'NaN', strategy = 'mean', axis = 0)` because there is no such parameter for it. Check [docs](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html) — vb_rises, Oct 13 '19 at 16:08
Its a pretty long one. Check the question again. ive edited and added the new error. — HARSHIT DANG, Oct 13 '19 at 16:18

score 0 · Answer 1 · answered Oct 13 '19 at 16:27

0

Your dataset possibly seems to still have NaN values. Try

dataset.isnull().any()

to check for columns having Nan values.

answered Oct 13 '19 at 16:27

Imanpal Singh

1,105
1
12
22

I think the problem was in ` imputer = SimpleImputer(missing_values = 'NaN', strategy = 'mean') ` Ive changed it to ` imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean') ` – HARSHIT DANG Oct 13 '19 at 16:29

Why is the following python code related to sklearn preprocessing not working and how can I debug it?

1 Answers1