15

since updating to Spark 2.3.0, tests which are run in my CI (Semaphore) fail due to a allegedly invalid spark url when creating the (local) spark context:

18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
    at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
    at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
    at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
    at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
    at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)

The spark session is created as following:

val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate

Before updating to Spark 2.3.0, no problems were encountered in version 2.2.1 and 2.1.0. Also, running the tests locally works fine.

James Z
  • 12,209
  • 10
  • 24
  • 44
Lorenz Bernauer
  • 215
  • 1
  • 2
  • 7
  • How you run the application `sbt run` or `spark-submit`? – Sandeep Purohit Mar 07 '18 at 11:18
  • Neither, the code is executed within unit tests during the maven test phase. – Lorenz Bernauer Mar 07 '18 at 14:02
  • I tried to run code by `sbt run` it was working fine, if its give you Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610 then it not picking local as a master – Sandeep Purohit Mar 07 '18 at 14:04
  • Is your tests are running into the docker container? – Sandeep Purohit Mar 07 '18 at 14:13
  • The platform is an Ubuntu 14.04 LTS v1802 where it didn't work. One the local machine (Windows), it was no problem. However, thanks to your comment, I checked the platform settings in Semaphore, and switched to "Ubuntu 14.04 LTS v1802 (native Docker 17.12 support)". I don't know why, but now I can execute again all tests without any problem. – Lorenz Bernauer Mar 08 '18 at 02:28
  • I'm getting the same error since I upgraded to spark-core and spark-sql to 2.3.0 previous dependencies were org.apache.spark:spark-sql_2.11:2.2.1 & org.apache.spark:spark-core_2.11:2.2.1 and current are org.apache.spark:spark-sql_2.11:2.3.0 & org.apache.spark:spark-core_2.11:2.3.0 – Tal Barda May 06 '18 at 13:54
  • In what kind of environment does the error occur in your case? – Lorenz Bernauer May 06 '18 at 18:30
  • *lsb_release -a* output: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty *uname -a* output is: Linux railsonfire_be98de61-c2bc-4afa-af27-3fe1058e603d_6cc13d6a6b9f 4.4.0-121-generic #145~14.04.1-Ubuntu SMP Mon Apr 16 18:40:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux – Tal Barda May 09 '18 at 11:02
  • @LorenzBernauer - any ideas? – Tal Barda May 21 '18 at 14:28
  • My apologies, I didn't see your last reply. In my case I switch the platform which runs my tests at my CI provider to Ubuntu with native Docker support. Somehow it solved the problem, but to be honest, I don't understand why. – Lorenz Bernauer May 22 '18 at 08:03
  • @LorenzBernauer - it's weird because the `mvn clean install` command isn't within a docker – Tal Barda May 23 '18 at 06:55
  • I actually also didn't use Docker... that is the part which I don't understand. – Lorenz Bernauer May 23 '18 at 10:29

9 Answers9

27

Change the SPARK_LOCAL_HOSTNAME to localhost and try.

export SPARK_LOCAL_HOSTNAME=localhost
Tim Diekmann
  • 7,755
  • 11
  • 41
  • 69
9

This has been resolved by setting sparkSession config "spark.driver.host" to the IP address.

It seems that this change is required from 2.3 onwards.

Nagireddy Hanisha
  • 1,290
  • 4
  • 17
  • 39
8

If you don't want to change the environment variable, you can change the code to add the config in the SparkSession builder (like Hanisha said above).

In PySpark:

spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()
4

As mentioned in above answers, You need to change SPARK_LOCAL_HOSTNAME to localhost. In windows, you have to use SET command, SET SPARK_LOCAL_HOSTNAME=localhost

but this SET command is temporary. you may have to run it again and again in every new terminal. but instead, you can use SETX command, which is permanent.

SETX SPARK_LOCAL_HOSTNAME localhost

You can type above command in any place. just open a command prompt and run above command. Notice that unlike SET command, SETX command do not allow equation mark. you need to separate environment variable and the value by a Space.

if Success, you will see a message like "SUCCESS: Specified value was saved"

you can also verify that your variable is successfully added by just typing SET in a different command prompt. (or type SET s , which gives variables, starting with the letter 'S'). you can see that SPARK_LOCAL_HOSTNAME=localhost in results, which will not happen if you use SET command instead of SETX

Rajitha Fernando
  • 1,655
  • 15
  • 14
1

Change your hostname to have NO underscore.

spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610 to spark://HeartbeatReceiver@LXCtrusty1802d57a40eb:44610

Ubuntu AS root

#hostnamectl status
#hostnamectl --static set-hostname LXCtrusty1802d57a40eb

#nano /etc/hosts
    127.0.0.1   LXCtrusty1802d57a40eb
#reboot 
user3008410
  • 644
  • 7
  • 15
1

For anyone working in Jupyter Notebook. Adding %env SPARK_LOCAL_HOSTNAME=localhost to the very beginning of the cell solved it for me. Like so:

%env SPARK_LOCAL_HOSTNAME=localhost

import findspark
findspark.init()

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Test")
sc = SparkContext(conf = conf)
AaronDT
  • 3,940
  • 8
  • 31
  • 71
1

Setting .config("spark.driver.host", "localhost") fixed the issue for me.

        SparkSession spark = SparkSession
            .builder()
            .config("spark.master", "local")
            .config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
            .config("spark.hadoop.fs.s3a.buffer.dir", "/tmp")
            .config("spark.driver.memory", "2048m")
            .config("spark.executor.memory", "2048m")
            .config("spark.driver.bindAddress", "127.0.0.1")
            .config("spark.driver.host", "localhost")
            .getOrCreate();
deepb1ue
  • 11
  • 3
  • This is a duplicate of an of existing answer., by @Felipe. When answering older questions that already have answers, please make sure you provide either a novel solution or a significantly better explanation than existing answers. Remember to review all existing answers first. – tjheslin1 Mar 31 '22 at 06:33
0

Try to run Spark locally, with as many worker threads as logical cores on your machine :

.master("local[*]")
YohanT
  • 82
  • 1
  • 9
0

I would like to complement @Prakash Annadurai answer by saying:

If you want to make the variable settlement last after exiting the terminal, add it to your shell profile (e.g. ~/.bash_profile) with the same command:

export SPARK_LOCAL_HOSTNAME=localhost
slfan
  • 8,950
  • 115
  • 65
  • 78
RaphaëlR
  • 522
  • 7
  • 8