1

I have followed the steps to set up pyspark in intellij from this question:

Write and run pyspark in IntelliJ IDEA

Here is the simple code attempted to run:

#!/usr/bin/env python
from pyspark import *

def p(msg): print("%s\n" %repr(msg))

import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)

import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))

ardd = sc.parallelize(a)
p(ardd.collect())

Here is the result of submitting the code

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
  File "/git/misc/python/ptest.py", line 14, in <module>
    sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
  File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

However I really do not understand how this could be expected to work: in order to run in Spark the code needs to be bundled up and submitted via spark-submit.

So I doubt that that other question actually truly addressed submitting pyspark code through Intellij to spark.

Is there a way to submit pyspark code to pyspark? It would actually be

  spark-submit myPysparkCode.py

The pyspark executable itself is deprecated since Spark 1.0. Anyone have this working?

Community
  • 1
  • 1
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • Could you add your run configuration? – zero323 Feb 26 '17 at 22:48
  • 1
    @zero323 Ah - I had found the missing link here: need to add `PYSPARK_SUBMIT_ARGS` to `pyspark-shell` in the `run configuration` the other settings for PYTHONPATH, SPARK_HOME are as shown in that other question. I added an answer now for this. If you have other info to add pls feel free to add your own. btw : I am still looking for how to run the `pyspark` in the intellij python console. – WestCoastProjects Feb 26 '17 at 22:52
  • me too. pyspark shell and spark-submit, work fine for me, but I get the "Exception: Java gateway process" when i try to run it on intellij – Martin Klosi Mar 10 '17 at 06:41

1 Answers1

1

In my case the variable settings from the other Q&A Write and run pyspark in IntelliJ IDEA covered most but not all of the required settings. I tried them many times.

Only after adding :

  PYSPARK_SUBMIT_ARGS =  pyspark-shell

to the run configuration did pyspark finally quiet down and succeed.

Community
  • 1
  • 1
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560