how to run spark from jupyter on yarn client

Question

I have a single cluster deployed using cloudera manager and spark parcel installed, when typing pyspark in shell, it works yet the running the below code on jupyter throws exception

code

import sys
import py4j
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf()
conf.setMaster('yarn-client')
conf.setAppName('SPARK APP')
sc = SparkContext(conf=conf)
# sc= SparkContext.getOrCreate()
# sc.stop()

def mod(x):
    import numpy as np
    return (x, np.mod(x, 2))

rdd = sc.parallelize(range(1000)).map(mod).take(10)
print (rdd)

Exception

/usr/lib/python3.6/site-packages/pyspark/context.py in _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler_cls)
    187         self._accumulatorServer = accumulators._start_update_server(auth_token)
    188         (host, port) = self._accumulatorServer.server_address
--> 189         self._javaAccumulator = self._jvm.PythonAccumulatorV2(host, port, auth_token)
    190         self._jsc.sc().register(self._javaAccumulator)
    191 

TypeError: 'JavaPackage' object is not callable

At first glance you have Spark version mismatch, though it might be a Java CLASSPATH problem as well. You can check [my answer](https://stackoverflow.com/a/53457308/10465355) to https://stackoverflow.com/q/53455489/10465355 to determine if the former is the issue. — 10465355, Feb 14 '19 at 10:32

score 0 · Accepted Answer · answered Feb 17 '19 at 20:04

0

after searching abit, spark used version 1.6 is not compatible with python 3.7, had to run it using python 2.7

answered Feb 17 '19 at 20:04

Exorcismus

2,243
1
35
68

how to run spark from jupyter on yarn client

1 Answers1