I've created an Apache Spark map reduce application in Java. It reads data from a MongoDB cluster and performs a map-reduce. The data I'm processing is around 8 million records. Spark is running in standalone mode inside the java application.
The whole import & map thing works but once it's half way through the reduce part it throws „java.lang.OutOfMemoryError: unable to create new native thread“.
I've assigned 30GB of the available 32GB to the program. Open File limit is set relatively high. Allowed threads seems pretty high, too.
$ cat /proc/sys/kernel/threads-max
257073
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 128536
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 500000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 128536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Can someone provide help on how to get rid of those exceptions?
Edit:
Setting up a real Spark cluster and connecting my driver program to it instead of running it inside the driver cluster fixed my problem. Reducing the partition count from 15k to 2k also helped.
I'm leaving this open in case anybody knows the answer and somebody might stumble upon this.