I have a spark cluster set up and would like to integrate spark-NLP to run Word Embeddings. I have downloaded the Glove Embeddings 6B 100 model from the model download page and placed the unzipped files in Glove. When I run the following code:
word_embeddings=WordEmbeddingsModel.load("./glove")\
.setInputCols(["document","normal"])\
.setOutputCol("embeddings")
in local, it works soundly but I don't know how to use it in spark-submit.
I tried some ways likes:
spark-submit --master spark://remote-host:remote-port --files pyspark_pex_env.pex --jars spark-nlp_2.12-4.1.0.jar --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.1.0 example.py
and share the local files by HTTP (workers can see) and load by :
word_embeddings=WordEmbeddingsModel.load("http://local-host:port/glove")\
.setInputCols(["document","normal"])\
.setOutputCol("embeddings")
but all of them do not work. and I don't know what I should do?