1

I have a working Pyspark installation running through Jupyter on a Ubuntu VM.
Only one Java version (openjdk version "1.8.0_265"), and I can I can run a local Spark (v2.4.4) session like this without problems:

import pyspark
from pyspark.sql import SparkSession

memory_gb = 24
conf = (
    pyspark.SparkConf()
        .setMaster('local[*]')
        .set('spark.driver.memory', '{}g'.format(memory_gb))
)

spark = SparkSession \
    .builder \
    .appName("My Name") \
    .config(conf=conf) \
    .getOrCreate()

Now I want to use spark-nlp. I've installed spark-nlp using pip install spark-nlp in the same virtual environment my Pyspark is in.

However, when I try to use it, I get the error Exception: Java gateway process exited before sending its port number.

I've tried to follow the instructions in the documentation here, but to no success.

So doing

spark = SparkSession \
    .builder \
    .appName("RevDNS Stats") \
    .config(conf=conf) \
    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5")\
    .getOrCreate()

only results in the error mentioned above.

How do I fix this?

LukasKawerau
  • 1,071
  • 2
  • 23
  • 42

0 Answers0