Spark structures streaming too many threads with checkpointing on S3

Question

Spark 3.0.1
hadoop-aws 3.2.0

I have a simple spark streaming application that reads messages from Kafka topic, aggregates them and writes into Elasticsearch. I am using checkpointing and an S3 bucket to store them.

After some time application started to fail with the following exception:

[476.099s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
Error in TaskCompletionListener
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(Thread.java:801)
at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:939)
at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1345)
at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
at com.google.common.util.concurrent.ForwardingListeningExecutorService.submit(ForwardingListeningExecutorService.java:40)
at org.apache.hadoop.util.SemaphoredDelegatingExecutor.submit(SemaphoredDelegatingExecutor.java:112)
at com.google.common.util.concurrent.ForwardingListeningExecutorService.submit(ForwardingListeningExecutorService.java:40)
at org.apache.hadoop.util.SemaphoredDelegatingExecutor.submit(SemaphoredDelegatingExecutor.java:112)
at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.putObject(S3ABlockOutputStream.java:434)
at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:365)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.cancel(CheckpointFileManager.scala:163)
at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$cancelDeltaFile(HDFSBackedStateStoreProvider.scala:507)
at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.abort(HDFSBackedStateStoreProvider.scala:150)
at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps.$anonfun$mapPartitionsWithStateStore$2(package.scala:65)
at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps.$anonfun$mapPartitionsWithStateStore$2$adapted(package.scala:64)
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:125)
at org.apache.spark.TaskContextImpl.$anonfun$markTaskCompleted$1(TaskContextImpl.scala:124)
at org.apache.spark.TaskContextImpl.$anonfun$markTaskCompleted$1$adapted(TaskContextImpl.scala:124)
at org.apache.spark.TaskContextImpl.$anonfun$invokeListeners$1(TaskContextImpl.scala:137)
at org.apache.spark.TaskContextImpl.$anonfun$invokeListeners$1$adapted(TaskContextImpl.scala:135)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:135)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:124)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)

VisualVM shows, that amount of threads rising from the beginning until it reaches the max (~4.8K): image

And the majority of them are:

s3a-transfer-unbounded-poolXXX-tXX
s3a-transfer-shared-poolXXX-tXX

As I understood, the only place where these threads pools are created is

org.apache.hadoop.fs.s3a.S3AFileSystem#initialize

and Spark creates new filesystem every time

org.apache.spark.sql.execution.streaming.StreamMetadata#write

is called.

Why it is so? How can I prevent this thread creation?

score 0 · Answer 1 · answered Mar 04 '21 at 14:30

you can't stop those threads being created as the thread pool is needed for the AWS transfer manager, which is in the AWS library. When S3A's close() method is called it shuts down the transfer manager, and the thread pool. Which means: the problem is that spark isn't closing down the FS instances.

Make sure you don't have caching of the FS instances disabled, e.g. fs.s3a.impl.disable.cache MUST be false. That is the default -so work out where it's being change and stop it.

spark.hadoop.fs.s3a.impl.disable.cache false

Spark structures streaming too many threads with checkpointing on S3

1 Answers1