I have followed the steps to set up pyspark
in intellij from this question:
Write and run pyspark in IntelliJ IDEA
Here is the simple code attempted to run:
#!/usr/bin/env python
from pyspark import *
def p(msg): print("%s\n" %repr(msg))
import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)
import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))
ardd = sc.parallelize(a)
p(ardd.collect())
Here is the result of submitting the code
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
File "/git/misc/python/ptest.py", line 14, in <module>
sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
However I really do not understand how this could be expected to work: in order to run in Spark
the code needs to be bundled up and submitted via spark-submit
.
So I doubt that that other question actually truly addressed submitting pyspark code through Intellij to spark.
Is there a way to submit pyspark
code to pyspark
? It would actually be
spark-submit myPysparkCode.py
The pyspark
executable itself is deprecated since Spark 1.0
. Anyone have this working?