I want to use word2vec with PySpark to process some data.
I was previously using Google trained model GoogleNews-vectors-negative300.bin
with gensim
in Python.
Is there a way I can load this bin file with mllib.word2vec
?
Or does it make sense to export the data as a dictionary from Python {word : [vector]}
(or .csv
file) and then load it in PySpark
?
Thanks