I used KNNImputer for missing values in my dataset, I have a problem with the evaluation of this method while I am using MAE or MSE to compare both datasets, I received the error: Input contains NaN, infinity or a value too large for dtype('float64'). Of course, there is a missing value in the first data... Using cross-validation also doesn't help as I have to divide data, not sure anyway because my data is timestamp for different sensors as columns.
Code for calculating MSE :
import pandas as pd
from sklearn.impute import KNNImputer
from sklearn.metrics import mean_squared_error
# create a copy of data_clean to impute missing values
df = data_clean.copy()
# apply KNN imputation
imputer = KNNImputer(n_neighbors=5)
df[df.columns[1:]] = imputer.fit_transform(df[df.columns[1:]])
# calculate mean squared error for imputed values only
mask = ~df[df.columns[1:]].isna() # create a mask to only consider imputed values
mse = mean_squared_error(data_clean[df.columns[1:]][mask], df[df.columns[1:]][mask])
print(f"Mean Squared Error: {mse}")