0

I have a spark cluster running on 10 machines (1 - 10) with the master at machine 1. All of these run on CentOS 6.4.

I am trying to connect a jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on CentOS), using sparkR, to the cluster and get the spark context.

The code I am using is

Sys.setenv(SPARK_HOME="/usr/local/spark-1.4.1-bin-hadoop2.4") 
library(SparkR)
sc <- sparkR.init(master="spark://<master-ip>:7077")

The output I get is

attaching package: ‘SparkR’
The following object is masked from ‘package:stats’:
filter
The following objects are masked from ‘package:base’:
intersect, sample, table
Launching java with spark-submit command spark-submit sparkr-shell/tmp/Rtmpzo6esw/backend_port29e74b83c7b3 Error in sparkR.init(master = "spark://10.10.5.51:7077"): JVM is not ready after 10 seconds

Error in sparkRSQL.init(sc): object 'sc' not found

I am using Spark 1.4.1. The spark cluster is also running CDH 5.

The jupyterhub installation can connect to the cluster via pyspark and I have python notebooks which use pyspark.

Can someone tell me what I am doing wrong?

user3612324
  • 57
  • 1
  • 7

1 Answers1

0

I have a similar problem and have searching all around but no solutions. Can you please tell me what do you mean by "jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on CentOS), "?

We have 4 clusters too on CentOS 6.4. One of my other problem is that how do use an IDE like IPython or RStudio to interact with these 4 servers? Do I use my laptop to connect to these servers remotely (if yes, then how?) and if no then what can be the other solution.

Now to answer your question, I can give it a try. I think the you have to use --yarn-cluster option as stated here I hope this helps you solving the problem.

Cheers, Ashish

mnm
  • 1,962
  • 4
  • 19
  • 46
  • You can have IPython expose its notebooks as a webserver for working on another machine. – Techrocket9 Jul 29 '15 at 20:40
  • Many thanks for your response. Please, can you elaborate more on your answer. IPython should be installed on my laptop or the server or both ? How do I expose it as a webserver? – mnm Jul 30 '15 at 00:07
  • IPython notebooks expose a webpage that you can browse to in a regular web browser. You only need IPython set up on the server. The install can be difficult if you don't use a prebuilt IPython distribution, as noted here: http://ipython.org/install.html – Techrocket9 Jul 30 '15 at 03:59
  • Thank you again for the explanation. It is very helpful. And I think I am in trouble now because the server was not installed using a pre-built IPython distribution. – mnm Jul 30 '15 at 05:24