Change ulmit value in Spark

Question

I am running Spark codes in EC2 instance. I am running into the "Too many open files" issue (logs below), and I searched online and seems I need to set ulimit to a higher number. Since I am running the Spark job in AWS, and I don't know where the config file is, how can I pass that value in my Spark code?

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 255 in stage 19.1 failed 4 times, most recent failure: Lost task 255.3 in stage 19.1 (TID 749786, 172.31.20.34, executor 207): java.io.FileNotFoundException: /media/ebs0/yarn/local/usercache/data-platform/appcache/application_1559339304634_2088/blockmgr-90a63e4a-dace-4246-a158-270b0b34c1f9/20/broadcast_13 (Too many open files)

possible duplicate of [this](https://stackoverflow.com/questions/25707629/why-does-spark-job-fail-with-too-many-open-files) — Ram Ghadiyaram, Jul 10 '19 at 20:36
The ulimit is a property of the system and user. https://unix.stackexchange.com/questions/8945/how-can-i-increase-open-files-limit-for-all-processes should show you how to change it. — tk421, Jul 11 '19 at 20:22

score 0 · Answer 1 · answered Oct 29 '19 at 20:40

Apart from changing the ulimit you should also look for connection leakages. For eg: Check if your i/o connections are properly closed. We saw Too many open files exception even with 655k ulimit on every node. Later we found the connection leakages in the code.

Change ulmit value in Spark

1 Answers1