0

I tried to follow https://pypi.org/project/fancyimpute/

# print mean squared error for the four imputation methods above
ii_mse = ((X_filled_ii[missing_mask] - X[missing_mask]) ** 2).mean()
print("Iterative Imputer norm minimization MSE: %f" % ii_mse)

nnm_mse = ((X_filled_nnm[missing_mask] - X[missing_mask]) ** 2).mean()
print("Nuclear norm minimization MSE: %f" % nnm_mse)

softImpute_mse = ((X_filled_softimpute[missing_mask] - X[missing_mask]) ** 2).mean()
print("SoftImpute MSE: %f" % softImpute_mse)

knn_mse = ((X_filled_knn[missing_mask] - X[missing_mask]) ** 2).mean()
print("knnImpute MSE: %f" % knn_mse)

What is missing_mask and how can I get it from data frame with missing values?

Rosa
  • 155
  • 10

1 Answers1

1

A missing mask is a boolean array or a set of indices where your data is missing. So say for example you have an array with some missing values as follows.

[ 1 2] [ 3 NA]

A missing mask will be another array of booleans where entries are True in the place where your data is missing. In this case it will be:

[False False] [False True]

Please see this page for a related function in pandas. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mask.html

If you want to create a missing mask on an original dataset, you can use df.isna() or df.isnull().

In your case though, this procedure is not relevant. You already have your dataset of missing values. Just run the imputation on this dataset. The missing mask in the FancyImpute manual is only if you want to perhaps calculate performance metrics or artificially create a dataset with missing values to then perform imputation on.

Hope this was helpful and good luck!

Ife A
  • 43
  • 4