1

On a shared hosting, I have several Ruby on Rails applications(5.0 and 5.2 with ruby 2.2 or 2.4), and some of them, after I start them and I access some pages, stop working and a 504 Gateway Time-out error occurs. I can see nothing in the log file when it happens or in the CPanel's Metrics - Errors section, and if I try to load more and more pages, I get the 504 error and the Physical Memory Usage on the server increases to 99%.
However, just some of the apps are blocking, and the others don't. And they do not have something special that could consume memory. All of them have the version 3.12 of puma gem, but I tried to change it with 4, but nothing changed.

If I log in the SSH console, when I run the command ps -aux something like this is displayed:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
truesoft 17085  0.1  0.0 366116 22308 ?        S    11:29   0:00 lsphp
truesoft 25413  0.0  0.0  11228  1988 pts/0    Ss   08:11   0:00 -bash
truesoft 18492  0.0  0.0  51020  1664 pts/0    R+   11:30   0:00 ps -aux
truesoft  3525  0.5  0.2 268288 94924 ?        SNl  11:19   0:03 RACK: /home/truesoft/app1_which_works_fine/ (production)
truesoft 12526  1.2  0.2 268116 96808 ?        SNl  11:26   0:03 RACK: /home/truesoft/app2_which_works_fine/ (production)
truesoft 13773  1.5  0.2 250988 94000 ?        SNl  11:26   0:03 RACK: /home/truesoft/app3_which_works_fine/ (production)
truesoft 18029  4.1  0.3 287600 108072 ?       SN   11:30   0:01 RACK: /home/truesoft/app2_which_works_fine/ (production)
truesoft 18088 11.9  0.3 295860 118684 ?       SN   11:30   0:02 RACK: /home/truesoft/app2_which_works_fine/ (production)
truesoft 18266  0.0  0.2 294768 75004 ?        SN   11:30   0:00 RACK: /home/truesoft/app2_with_error/ (production)
truesoft 12413  0.0  0.2 359060 96356 ?        SNl  Dec18   0:49 RACK: /home/truesoft/app1_with_error/ (production)
truesoft 12444  0.0  0.2 359060 93172 ?        SN   Dec18   0:00 RACK: /home/truesoft/app1_with_error/ (production)
truesoft 24104  0.0  0.2 314648 82472 ?        SN   09:30   0:00 RACK: /home/truesoft/app3_with_error/ (production)
truesoft 25743  0.0  0.2 314648 82472 ?        SN   09:31   0:00 RACK: /home/truesoft/app3_with_error/ (production)
truesoft 31330  0.0  0.2 312456 83812 ?        SNl  09:34   0:03 RACK: /home/truesoft/app3_with_error/ (production)

There are processes that are hanged there for few days (see Dec18) or even more, and I have to kill them with -9.

I also ran strace -tt -p 31330 on a process in the moment that I saw the page in the browser keeps loading for a long time, without result, and here is the output:

strace: Process 31330 attached
12:58:42.475820 select(8, [7], NULL, NULL, {0, 668936}) = 0 (Timeout)
12:58:43.148229 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:44.153079 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:45.158192 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:46.163095 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:47.168427 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:48.169276 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:49.174225 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:50.175233 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)
12:58:51.180778 select(8, [7], NULL, NULL, {1, 0}) = 0 (Timeout)

I noticed that the line with select(8, [7], NULL, NULL, {1, 0}) is repeating forever, each second.

This select seems to be a linux system call (see man).

I've installed the same applicaton on another hosting, and there it works fine. One thing different is that for ps -aux the COMMAND column displays

Passenger AppPreloader: /home/other_domain/app3_which_had_error_on_other_server (forking...)

I'm a beginner with linux and I don't know where to check what is wrong. Maybe somebody could give me some hints to see where to look. On my local machine, all the applications work fine, even in production mode.

Edit: After further investigations, I found that this is a deadlock issue described here: puma/puma#1184. However, setting eager_load, or thread_count to 1 doesn't help.

True Soft
  • 8,675
  • 6
  • 54
  • 83

1 Answers1

0

On the hosting server there was Litespeed installed, and it seems that when the app's rails version is 5.2, a deadlock occurs in the linux processes, and cause the 504 error.

There is an old post here: https://www.litespeedtech.com/support/forum/threads/rails-2-2-cache_classes-problem.2493/ where the version 2.2 it is specified.

Without Litespeed, everything works fine.

True Soft
  • 8,675
  • 6
  • 54
  • 83