0

I've created an Apache Spark map reduce application in Java. It reads data from a MongoDB cluster and performs a map-reduce. The data I'm processing is around 8 million records. Spark is running in standalone mode inside the java application.

The whole import & map thing works but once it's half way through the reduce part it throws „java.lang.OutOfMemoryError: unable to create new native thread“.

I've assigned 30GB of the available 32GB to the program. Open File limit is set relatively high. Allowed threads seems pretty high, too.

$ cat /proc/sys/kernel/threads-max
257073
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128536
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 500000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 128536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Can someone provide help on how to get rid of those exceptions?

Edit:

Setting up a real Spark cluster and connecting my driver program to it instead of running it inside the driver cluster fixed my problem. Reducing the partition count from 15k to 2k also helped.

I'm leaving this open in case anybody knows the answer and somebody might stumble upon this.

Jochen Ullrich
  • 568
  • 3
  • 22
  • You can try Hadoop MongoDB Connector for it : http://stackoverflow.com/questions/32469951/reading-huge-mongodb-collection-from-spark-with-help-of-worker/32487357#32487357 – Ajay Gupta Oct 06 '15 at 10:14
  • i am already using that, the whole job does work, i tried it on a smaller data set. it just crashes with the exception i wrote. – Jochen Ullrich Oct 06 '15 at 10:14
  • Can you post some part of the code? – Ajay Gupta Oct 06 '15 at 10:26
  • It's a lot of files and I don't really think the code will help. I've pretty much posted all the settings I used. If you need any more information I can post the configuration I've used, but it's not a lot of configuration, pretty much all defaults. – Jochen Ullrich Oct 06 '15 at 11:30

0 Answers0