I've got a dataset with multiple NaN values in my Dependent Variable column. I've split the set into Dependent and Independent Variables, and I'm currently trying to replace all NaN values in my Dependent Variable column with 0's. However, I'm getting an error when using SimpleImputer for this purpose.
Here's my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
"""
----- Read Dataset and Split into Dependent and Independent Variables: -----
"""
dataset = pd.read_csv('Salary_Data.csv')
x = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
print("\nIndependent Variables: \n%s" % x)
print("\nDependent Variables: \n%s" % y)
"""
----- Fill in Missing Values: -----
"""
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = 'constant', fill_value = 0)
y = imputer.fit(y)
print("\nDependent Variables After Missing Values Adjusted: \n%s" % y)
Here's the error I get:
Expected 2D array, got 1D array instead:
array=[270000. 200000. 250000. nan 425000. nan nan 252000. 231000.
nan 260000. 250000. nan 218000. nan 200000. 300000. nan