0

I used KNNImputer for missing values in my dataset, I have a problem with the evaluation of this method while I am using MAE or MSE to compare both datasets, I received the error: Input contains NaN, infinity or a value too large for dtype('float64'). Of course, there is a missing value in the first data... Using cross-validation also doesn't help as I have to divide data, not sure anyway because my data is timestamp for different sensors as columns.

Code for calculating MSE :

import pandas as pd
from sklearn.impute import KNNImputer
from sklearn.metrics import mean_squared_error


# create a copy of data_clean to impute missing values
df = data_clean.copy()

# apply KNN imputation
imputer = KNNImputer(n_neighbors=5)
df[df.columns[1:]] = imputer.fit_transform(df[df.columns[1:]])

# calculate mean squared error for imputed values only
mask = ~df[df.columns[1:]].isna()  # create a mask to only consider imputed values
mse = mean_squared_error(data_clean[df.columns[1:]][mask], df[df.columns[1:]][mask])
print(f"Mean Squared Error: {mse}")
Progman
  • 16,827
  • 6
  • 33
  • 48
Sepide H
  • 13
  • 2
  • Are there NaNs in your features / dataset or also in your labels? Which function exactly is throwing the error? – RvdV May 12 '23 at 13:28
  • @RvdV Error is for evaluation part which calculates mean squared error and original data includes missing values in almost all variables/columns except the timestamp column. – Sepide H May 12 '23 at 13:52

0 Answers0