how to connect sparkcontext to CDH 6 on yarn

Question

i'm trying to run a simple mllib function (fpgrowth) from java from a remote computer on CDH 6 community version.

as default i tried to connect like this :

`SparkConf conf = new SparkConf().setAppName("FPGrowth").setMaster("spark://some ip:7077").set("spark.cores.max", "10");`

but the connection fails and i also checked netstat -plnt and there is no program listening on 7077 port.

is there a new way to connect a sparkcontext on cdh 6? i guess it is now integrated on yarn but how am i supposed to connect to t and make a sparkcontext ?

thanks

score 0 · Answer 1 · answered Feb 04 '19 at 11:14

Switching from a local mode to a cluster mode in Spark is unfortunately not as easy, but it is a well-documented process. You will also have to make sure that your files (if you use any) are accessible from each of the cluster's execution node, by probably putting them on HDFS.

You will first have to make sure that Hadoop's client is configured on the machine you are running the code and then you can execute your code.

Typically, you will use spark-submit as in:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue thequeue \
    examples/jars/spark-examples*.jar \
    10

But you should also be able to execute it like:

SparkSession spark = SparkSession.builder()
    .appName("app")
    .master("yarn")
    .getOrCreate();

You will find more details at: https://spark.apache.org/docs/latest/running-on-yarn.html.

how to connect sparkcontext to CDH 6 on yarn

1 Answers1