1

I have a long-running EMR step that executes spark-submit on EMR client mode. Between job executions, I manually restart the Spark context before the next execution if any configuration changes, like --executor-memory.

I'm running into the following exception when I try to restart the context with the new configuration with

currentSparkSession.close();
return SparkSession.builder().config(newConfig).getOrCreate();
19/05/23 15:52:35 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:689)
    at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:186)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:567)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:923)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:915)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:915)
.
.
.
19/05/23 15:52:35 INFO SparkContext: SparkContext already stopped.
19/05/23 15:52:35 WARN TransportChannelHandler: Exception in connection from /172.31.0.165:42556
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    at java.lang.Thread.run(Thread.java:748)

I tried making the thread sleep a little in case there needs to be some time between the stop and start like:

currentSparkSession.close();
Thread.sleep(5000); // Sleep 5 seconds
return SparkSession.builder().config(newConfig).getOrCreate();

but that doesn't work either. I looked at the Spark source code and it looks like currentSparkSession.close() won't return until it's actually stopped anyways, so making the Thread sleep doesn't do anything.

I also see this in the container logs:

Error occurred during initialization of VM
Initial heap size set to a larger value than the maximum heap size
End of LogType:stdout

which confuses me because the only configured I changed between executions was --executor-memory, and I actually DECREASED it instead of increasing.

I've found similar questions on this site like Apache Spark running spark-shell on YARN error, but these suggestions look like they're essentially just turning off some resource manager validation checks that don't look very safe to me. Any suggestions?

user3613290
  • 461
  • 6
  • 18

1 Answers1

1

This is because I tried sending a request with a lower --executor-memory (which happens to set Xmx, max heap size) than Xms (initial heap size), which was configured on the initial spark submit. The exception was thrown since max heap size can never be smaller than initial heap size.

user3613290
  • 461
  • 6
  • 18