The Scenario
I have a Dataset whose last column has NaN values in it, which need to be imputed using only Vector Cosine & Pearson Correlation; after which the data will be further taken for Clustering.
The Problem
It is mandatory for my case to use VECTOR COSINE and PEARSON CORELATION.
Here's a chunk of how my dataset is post_df1 which is taken from csv using pandas
uid iid rat
1 303.0 785.0 3.000000
2 291.0 1042.0 4.000000
3 234.0 1184.0 2.000000
4 102.0 768.0 2.000000
254 944.0 170.0 5.000000
255 944.0 171.0 5.000000
256 944.0 172.0 NaN
257 944.0 173.0 NaN
258 944.0 174.0 NaN
Which is now taken into a Vector (Just to make it easy, suggestions required) using this command
vect_1 = post_df1.iloc[:, 2].values
Yet with sklearn.preprocessing
's Class called Imputer
are having Mean, Median & Most frequent
methods available, but won't work according to my Scenario.
Questions
- Is there any other Package than SurPRISE (by Nicholas Hug), for Vector Cosine & Pearson mehtod
- Is it possible to pass a function / method in sklearn for cosine & pearson?
- Any other method / way out?