How to increase ulimit on AWS EMR with AutoScaling, dynamically?

Question

I have a Spark application in Java, running on AWS EMR. I have implemented an AutoScaling policy based on the available yarn memory. For jobs which require higher memory, EMR brings up cluster up to 1+8 nodes.

After a point of time in my job I keep getting the below error, this error goes on for hours before I terminate cluster manually.

java.io.IOException: All datanodes [DatanodeInfoWithStorage[<i.p>:50010,DS-4e7690c7-5946-49c5-b203-b5166c2ff58d,DISK]] are bad. Aborting...
at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1531)
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1465)
at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1237)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:657)

This error is in the very first worker node that was spawned. After some digging, I found out this might be because of ulimit. Now increasing ulimit can be done easily on any Linux or EC2 machines manually. But I am unable to get how to do this dynamically every EMR cluster that is spawned.

Further, I am not even 100% sure if ulimit is causing this particular issue. This might be something else as well. I can confirm only once I change ulimit and check.

Can you check the node manager log for this data node? Issue can be related to not enough space available on disk too? — Prabhakar Reddy, Aug 20 '20 at 06:09
It's better to use `ContainerPendingRatio` instead of yarn memory. Also see full logs. — Snigdhajyoti, Aug 30 '20 at 12:19

How to increase ulimit on AWS EMR with AutoScaling, dynamically?

0 Answers0