I get spark dataframe like bellow, result
is the id's vector:
+--------------------+--------------------+
| id | result|
+--------------------+--------------------+
|000ab862128e11eab...|[-0.46, 0.31, 0.2] |
|0026f306128e11eab...|[-0.46, 0.31, 0.2] |
|00313b10d11b11ea9...|[-0.25, 0.70, 0.36] |
|00337629128e11eab...|[-0.46, 0.31, 0.51] |
|005492e4128e11eab...|[0.55, 0.66, 0.85] |
+--------------------+--------------------+
How to get the top 5 most similar items efficiently? I have defined a cosineSimility
function, which takes two vectors in "result" as parameters.