I have a working Pyspark installation running through Jupyter on a Ubuntu VM.
Only one Java version (openjdk version "1.8.0_265"
), and I can I can run a local Spark (v2.4.4) session like this without problems:
import pyspark
from pyspark.sql import SparkSession
memory_gb = 24
conf = (
pyspark.SparkConf()
.setMaster('local[*]')
.set('spark.driver.memory', '{}g'.format(memory_gb))
)
spark = SparkSession \
.builder \
.appName("My Name") \
.config(conf=conf) \
.getOrCreate()
Now I want to use spark-nlp
. I've installed spark-nlp
using pip install spark-nlp
in the same virtual environment my Pyspark is in.
However, when I try to use it, I get the error Exception: Java gateway process exited before sending its port number
.
I've tried to follow the instructions in the documentation here, but to no success.
So doing
spark = SparkSession \
.builder \
.appName("RevDNS Stats") \
.config(conf=conf) \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5")\
.getOrCreate()
only results in the error mentioned above.
How do I fix this?