3

Recently I observed the "java.lang.OutOfMemoryError: unable to create new native thread" for two standalone Java webapps on the same machine when the total number of threads for both of them reached 1024.

The command I used to show the number of threads for a process is: ps huH p $pid | wc -l

[root@vm119 ~]# ps huH p 11294 | wc -l
378
[root@vm119 ~]# ps huH p 11052 | wc -l
646

The java webapps in my case are actually Java daemons spawned from two copies of the same jar file.

At the time of this instance, there were still plenty of free RAM shown by vmstat. I also started another java code(keeps creating new threads until it gets OutOfMemoryError and print out the total amount of threads it creates) to see how many threads it can create. Expectedly, it said it still can create 31051 threads. This means that the OS still have native resources required to create native threads at that time.

Both of the Java webapps are started with the following JVM options: -Xmx4096m -Xms512m -Xss256k

The ulimit -a on the machine:

[root@vm119 ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 62810
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 62810
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I researched and followed normal investigation routine for "java.lang.OutOfMemoryError: unable to create new native thread" problem, but could not really find anything useful.

The total number of 1024 threads for both webapps seems very suspicious...Was it an accident that they added up to 1024? Or there is some limitation on the OS or JVM that I missed?

GJ.
  • 870
  • 5
  • 9
  • 26
  • which user runs the two daemons? – Aris2World Jul 12 '16 at 16:20
  • @Aris2World the daemons were started by two different non-su users and the command to start the daemon is `start-stop-daemon -c${user} ...` – GJ. Jul 12 '16 at 17:05
  • 1
    @Aris2World I made a mistake.The two webapps were started by the same non-su user. – GJ. Jul 12 '16 at 17:18
  • Maybe this article could be useful for you https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread – lenach87 Jul 12 '16 at 17:38
  • @lenach87 thank you for bring that up. However, that article was one of the many that I read during my research yesterday. Among the articles I read, I found http://javaeesupportpatterns.blogspot.ca/2012/09/outofmemoryerror-unable-to-create-new.html more illustrative. However, it still does not help me with my current problem. – GJ. Jul 12 '16 at 17:41
  • Probably that's the stupid question - but could it be that the user which started java processes has some other limits? I've found such information "the default number of process per users is 1024 by default"... – lenach87 Jul 12 '16 at 17:51
  • @lenach87 Hi Lenach, can you show me a link to the finding you mentioned? Thanks! – GJ. Jul 12 '16 at 17:58
  • Not sure if the author is right about the default 1024, but who knows. That's where I found this wording (last line of text) http://www.mastertheboss.com/jboss-server/jboss-monitoring/how-to-solve-javalangoutofmemoryerror-unable-to-create-new-native-thread – lenach87 Jul 12 '16 at 18:05
  • One other situation I found in comments "Your post led me in the right direction. I found a file in /etc/security/limits.d that was limiting the number of processes for all non-root users to 1024, which is not enough for Jenkins. By the way that is the "correct" place to set the limits in Linux (at least in RHEL/CentOS). " link http://www.devgrok.com/2012/03/resolving-outofmemoryerror-unable-to.html – lenach87 Jul 12 '16 at 18:41

1 Answers1

5

With the help from Aris2World and lenach87, I managed to find the answer to my own question.

The root cause of it is because of the max user processes(NPROC) limit that Linux has against the executing user of a process.

I logged in as root during my investigation, hence the result ulimit -a was for root:

[root@vm119 ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 62810
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 62810
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

However, what I should have checked was the limit for the executing user of the webapps:

[root@vm119 ~]# su - user -c "ulimit -a"
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 62810
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

In order to change the limit for my executing user, I manually inserted two lines to

/etc/security/limits.conf

[root@vm119 ~]# cat /etc/security/limits.conf | grep user
user            soft    nproc   4096
user            hard    nproc   4096
GJ.
  • 870
  • 5
  • 9
  • 26