On a shared hosting, I have several Ruby on Rails applications(5.0 and 5.2 with ruby 2.2 or 2.4), and some of them, after I start them and I access some pages, stop working and a 504 Gateway Time-out
error occurs. I can see nothing in the log file when it happens or in the CPanel's Metrics - Errors section, and if I try to load more and more pages, I get the 504 error and the Physical Memory Usage on the server increases to 99%.
However, just some of the apps are blocking, and the others don't. And they do not have something special that could consume memory.
All of them have the version 3.12 of puma gem, but I tried to change it with 4, but nothing changed.
If I log in the SSH console, when I run the command ps -aux
something like this is displayed:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
truesoft 17085 0.1 0.0 366116 22308 ? S 11:29 0:00 lsphp
truesoft 25413 0.0 0.0 11228 1988 pts/0 Ss 08:11 0:00 -bash
truesoft 18492 0.0 0.0 51020 1664 pts/0 R+ 11:30 0:00 ps -aux
truesoft 3525 0.5 0.2 268288 94924 ? SNl 11:19 0:03 RACK: /home/truesoft/app1_which_works_fine/ (production)
truesoft 12526 1.2 0.2 268116 96808 ? SNl 11:26 0:03 RACK: /home/truesoft/app2_which_works_fine/ (production)
truesoft 13773 1.5 0.2 250988 94000 ? SNl 11:26 0:03 RACK: /home/truesoft/app3_which_works_fine/ (production)
truesoft 18029 4.1 0.3 287600 108072 ? SN 11:30 0:01 RACK: /home/truesoft/app2_which_works_fine/ (production)
truesoft 18088 11.9 0.3 295860 118684 ? SN 11:30 0:02 RACK: /home/truesoft/app2_which_works_fine/ (production)
truesoft 18266 0.0 0.2 294768 75004 ? SN 11:30 0:00 RACK: /home/truesoft/app2_with_error/ (production)
truesoft 12413 0.0 0.2 359060 96356 ? SNl Dec18 0:49 RACK: /home/truesoft/app1_with_error/ (production)
truesoft 12444 0.0 0.2 359060 93172 ? SN Dec18 0:00 RACK: /home/truesoft/app1_with_error/ (production)
truesoft 24104 0.0 0.2 314648 82472 ? SN 09:30 0:00 RACK: /home/truesoft/app3_with_error/ (production)
truesoft 25743 0.0 0.2 314648 82472 ? SN 09:31 0:00 RACK: /home/truesoft/app3_with_error/ (production)
truesoft 31330 0.0 0.2 312456 83812 ? SNl 09:34 0:03 RACK: /home/truesoft/app3_with_error/ (production)
There are processes that are hanged there for few days (see Dec18
) or even more, and I have to kill them with -9.
I also ran strace -tt -p 31330
on a process in the moment that I saw the page in the browser keeps loading for a long time, without result, and here is the output:
strace: Process 31330 attached
12:58:42.475820 select(8, [7], NULL, NULL, {0, 668936}) = 0 (Timeout)
12:58:43.148229 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:44.153079 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:45.158192 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:46.163095 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:47.168427 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:48.169276 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:49.174225 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:50.175233 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:51.180778 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
I noticed that the line with select(8, [7], NULL, NULL, {1, 0})
is repeating forever, each second.
This select
seems to be a linux system call (see man).
I've installed the same applicaton on another hosting, and there it works fine. One thing different is that for ps -aux
the COMMAND column displays
Passenger AppPreloader: /home/other_domain/app3_which_had_error_on_other_server (forking...)
I'm a beginner with linux and I don't know where to check what is wrong. Maybe somebody could give me some hints to see where to look. On my local machine, all the applications work fine, even in production mode.
Edit: After further investigations, I found that this is a deadlock issue described here: puma/puma#1184. However, setting eager_load
, or thread_count
to 1 doesn't help.