Impute values of a vector using Cosine similarity in Python

Question

The Scenario

I have a Dataset whose last column has NaN values in it, which need to be imputed using only Vector Cosine & Pearson Correlation; after which the data will be further taken for Clustering.

The Problem

It is mandatory for my case to use VECTOR COSINE and PEARSON CORELATION.

Here's a chunk of how my dataset is post_df1 which is taken from csv using pandas

       uid     iid       rat
1    303.0   785.0  3.000000
2    291.0  1042.0  4.000000
3    234.0  1184.0  2.000000
4    102.0   768.0  2.000000
254  944.0   170.0  5.000000
255  944.0   171.0  5.000000
256  944.0   172.0       NaN
257  944.0   173.0       NaN
258  944.0   174.0       NaN

Which is now taken into a Vector (Just to make it easy, suggestions required) using this command

vect_1 = post_df1.iloc[:, 2].values

Yet with sklearn.preprocessing's Class called Imputer are having Mean, Median & Most frequent methods available, but won't work according to my Scenario.

Questions

Is there any other Package than SurPRISE (by Nicholas Hug), for Vector Cosine & Pearson mehtod
Is it possible to pass a function / method in sklearn for cosine & pearson?
Any other method / way out?

score 1 · Answer 1 · answered Oct 03 '17 at 14:20

1

Cosine silirality and Pearson correlation are only parameters in the imputation method, not imputation method. There are various methods of imputation, such as KNN, MICE, SVD and Matrix Factorization. For example, it is possible to use cosine silirality as a parameter of one KNN of the imputation method, but its implementation itself could not be found. fancyimpute package may be helpful as a package with a near implementation. The following is the link. GitHub - hammerlab / fancyimpute: Multivariate imputation and matrix completion algorithms implemented in Python https://github.com/hammerlab/fancyimpute/

answered Oct 03 '17 at 14:20

Keiku

8,205
4
41
44

I didn't quite get this line _"only parameters in the imputation method, not imputation method"_. I'll try fancyimpute though. – T3J45 Oct 03 '17 at 15:04
I am sorry I can not answer easy to understand. For example, in hierarchical clustering, I think that taking a distance as an argument, similarly, taking cosine similarity and Pearson correlation as an argument of the method of imputation, it is not itself a method. – Keiku Oct 03 '17 at 15:17

Impute values of a vector using Cosine similarity in Python

The Scenario

The Problem

Questions

1 Answers1