Invalid Spark URL in local spark session

Question

since updating to Spark 2.3.0, tests which are run in my CI (Semaphore) fail due to a allegedly invalid spark url when creating the (local) spark context:

18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
    at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
    at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
    at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
    at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
    at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)

The spark session is created as following:

val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate

Before updating to Spark 2.3.0, no problems were encountered in version 2.2.1 and 2.1.0. Also, running the tests locally works fine.

Neither, the code is executed within unit tests during the maven test phase. — Lorenz Bernauer, Mar 07 '18 at 14:02
I tried to run code by `sbt run` it was working fine, if its give you Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610 then it not picking local as a master — Sandeep Purohit, Mar 07 '18 at 14:04
The platform is an Ubuntu 14.04 LTS v1802 where it didn't work. One the local machine (Windows), it was no problem. However, thanks to your comment, I checked the platform settings in Semaphore, and switched to "Ubuntu 14.04 LTS v1802 (native Docker 17.12 support)". I don't know why, but now I can execute again all tests without any problem. — Lorenz Bernauer, Mar 08 '18 at 02:28
I'm getting the same error since I upgraded to spark-core and spark-sql to 2.3.0 previous dependencies were org.apache.spark:spark-sql_2.11:2.2.1 & org.apache.spark:spark-core_2.11:2.2.1 and current are org.apache.spark:spark-sql_2.11:2.3.0 & org.apache.spark:spark-core_2.11:2.3.0 — Tal Barda, May 06 '18 at 13:54
In what kind of environment does the error occur in your case? — Lorenz Bernauer, May 06 '18 at 18:30
*lsb_release -a* output: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty *uname -a* output is: Linux railsonfire_be98de61-c2bc-4afa-af27-3fe1058e603d_6cc13d6a6b9f 4.4.0-121-generic #145~14.04.1-Ubuntu SMP Mon Apr 16 18:40:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux — Tal Barda, May 09 '18 at 11:02
My apologies, I didn't see your last reply. In my case I switch the platform which runs my tests at my CI provider to Ubuntu with native Docker support. Somehow it solved the problem, but to be honest, I don't understand why. — Lorenz Bernauer, May 22 '18 at 08:03
@LorenzBernauer - it's weird because the `mvn clean install` command isn't within a docker — Tal Barda, May 23 '18 at 06:55
I actually also didn't use Docker... that is the part which I don't understand. — Lorenz Bernauer, May 23 '18 at 10:29

score 27 · Answer 1 · edited Jun 01 '18 at 04:23

27

Change the SPARK_LOCAL_HOSTNAME to localhost and try.

export SPARK_LOCAL_HOSTNAME=localhost

edited Jun 01 '18 at 04:23

Tim Diekmann

7,755
11
41
69

answered May 31 '18 at 23:35

Prakash Annadurai

317
6
10

3

In windows platform you have to use **SET SPARK_LOCAL_HOSTNAME=localhost** – ADARSH K Apr 11 '19 at 04:41

score 9 · Answer 2 · answered Jul 16 '18 at 07:11

9

This has been resolved by setting sparkSession config "spark.driver.host" to the IP address.

It seems that this change is required from 2.3 onwards.

answered Jul 16 '18 at 07:11

Nagireddy Hanisha

1,290
4
17
39

score 8 · Answer 3 · answered Jan 10 '22 at 18:41

8

If you don't want to change the environment variable, you can change the code to add the config in the SparkSession builder (like Hanisha said above).

In PySpark:

spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()

answered Jan 10 '22 at 18:41

Felipe Zschornack

111
1
4

Same syntax in scala. – Kyros Feb 08 '23 at 16:46

Rajitha Fernando · Answer 4 · 2019-09-11T10:33:01.233

As mentioned in above answers, You need to change SPARK_LOCAL_HOSTNAME to localhost. In windows, you have to use SET command, SET SPARK_LOCAL_HOSTNAME=localhost

but this SET command is temporary. you may have to run it again and again in every new terminal. but instead, you can use SETX command, which is permanent.

SETX SPARK_LOCAL_HOSTNAME localhost

You can type above command in any place. just open a command prompt and run above command. Notice that unlike SET command, SETX command do not allow equation mark. you need to separate environment variable and the value by a Space.

if Success, you will see a message like "SUCCESS: Specified value was saved"

you can also verify that your variable is successfully added by just typing SET in a different command prompt. (or type SET s , which gives variables, starting with the letter 'S'). you can see that SPARK_LOCAL_HOSTNAME=localhost in results, which will not happen if you use SET command instead of SETX

Thank you so so much, I was trying many tutorials until I found your answer — V.Hunon, Jul 02 '22 at 14:44

score 1 · Answer 5 · answered Mar 22 '19 at 10:46

Change your hostname to have NO underscore.

spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610 to spark://HeartbeatReceiver@LXCtrusty1802d57a40eb:44610

Ubuntu AS root

#hostnamectl status
#hostnamectl --static set-hostname LXCtrusty1802d57a40eb

#nano /etc/hosts
    127.0.0.1   LXCtrusty1802d57a40eb
#reboot

score 1 · Answer 6 · answered Jul 06 '19 at 11:34

For anyone working in Jupyter Notebook. Adding %env SPARK_LOCAL_HOSTNAME=localhost to the very beginning of the cell solved it for me. Like so:

%env SPARK_LOCAL_HOSTNAME=localhost

import findspark
findspark.init()

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Test")
sc = SparkContext(conf = conf)

score 1 · Answer 7 · answered Mar 30 '22 at 22:14

Setting .config("spark.driver.host", "localhost") fixed the issue for me.

        SparkSession spark = SparkSession
            .builder()
            .config("spark.master", "local")
            .config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
            .config("spark.hadoop.fs.s3a.buffer.dir", "/tmp")
            .config("spark.driver.memory", "2048m")
            .config("spark.executor.memory", "2048m")
            .config("spark.driver.bindAddress", "127.0.0.1")
            .config("spark.driver.host", "localhost")
            .getOrCreate();

This is a duplicate of an of existing answer., by @Felipe. When answering older questions that already have answers, please make sure you provide either a novel solution or a significantly better explanation than existing answers. Remember to review all existing answers first. — tjheslin1, Mar 31 '22 at 06:33

score 0 · Answer 8 · answered Jun 13 '18 at 11:46

0

Try to run Spark locally, with as many worker threads as logical cores on your machine :

.master("local[*]")

answered Jun 13 '18 at 11:46

YohanT

82
1
9

score 0 · Answer 9 · edited May 11 '19 at 11:39

0

I would like to complement @Prakash Annadurai answer by saying:

If you want to make the variable settlement last after exiting the terminal, add it to your shell profile (e.g. ~/.bash_profile) with the same command:

export SPARK_LOCAL_HOSTNAME=localhost

edited May 11 '19 at 11:39

slfan

8,950
115
65
78

answered May 11 '19 at 11:18

RaphaëlR

522
7
8

Invalid Spark URL in local spark session

9 Answers9

Linked