0

I am fairly new to managing my own servers and have a problem where some Apache2 processes hog 100% of the CPU. This lasts for about half an hour and a couple of hours later, it will start again. It will even happen right after restarting Apache or rebooting the server.

The server is used to serve two very low traffic websites. It uses a CMS called Pimcore, which itself is based off of Symfony.

DigitalOcean Droplet 1GB of RAM

Ubuntu 18.04

PHP 7.2 fpm

MySql 14.14

Pimcore CMS (Symfony)

I ran the exact same setup on Ubuntu 17.10 before, and everything went fine. Since I moved my setup to a new server with Ubuntu 18.04 (fresh install of everything) I started seeing these issues.

I suspect it has something to do with some PHP script executing, but I can figure out the exact source of it.

Does anybody have an idea what could be going on?

Apache2 Config:

<IfModule mpm_prefork_module>
        StartServers            2
        MinSpareServers           3
        MaxSpareServers           5
        MaxRequestWorkers         20
        MaxConnectionsPerChild   3000
        MaxClients              15
</IfModule>

Apache Modules:

Loaded Modules:
 core_module (static)
 so_module (static)
 watchdog_module (static)
 http_module (static)
 log_config_module (static)
 logio_module (static)
 version_module (static)
 unixd_module (static)
 access_compat_module (shared)
 alias_module (shared)
 auth_basic_module (shared)
 authn_core_module (shared)
 authn_file_module (shared)
 authz_core_module (shared)
 authz_host_module (shared)
 authz_user_module (shared)
 autoindex_module (shared)
 deflate_module (shared)
 dir_module (shared)
 env_module (shared)
 filter_module (shared)
 mime_module (shared)
 mpm_prefork_module (shared)
 negotiation_module (shared)
 php7_module (shared)
 proxy_module (shared)
 proxy_fcgi_module (shared)
 reqtimeout_module (shared)
 rewrite_module (shared)
 setenvif_module (shared)
 socache_shmcb_module (shared)
 ssl_module (shared)
 status_module (shared)
 wsgi_module (shared)

error.log

Fri May  1 11:01:48 2020 (1309): Error Cannot kill process 1069: Success!
[Fri May 01 11:01:49.207718 2020] [core:notice] [pid 923] AH00051: child pid 1309 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:13:15.899518 2020] [core:notice] [pid 923] AH00051: child pid 1333 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:13:15.899963 2020] [core:notice] [pid 923] AH00051: child pid 1383 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:13:15.899975 2020] [core:notice] [pid 923] AH00051: child pid 1406 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:13:15.900004 2020] [core:notice] [pid 923] AH00051: child pid 1305 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:13:15.900059 2020] [mpm_prefork:notice] [pid 923] AH00169: caught SIGTERM, shutting down
[Fri May 01 11:13:16.073253 2020] [mpm_prefork:notice] [pid 1605] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 mod_wsgi/4.5.17 Python/3.6 configured -- resuming normal operations
[Fri May 01 11:13:16.073329 2020] [core:notice] [pid 1605] AH00094: Command line: '/usr/sbin/apache2'
[Fri May 01 11:14:17.466068 2020] [core:notice] [pid 1605] AH00051: child pid 1613 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:14:17.466137 2020] [core:notice] [pid 1605] AH00051: child pid 1636 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:14:17.466181 2020] [mpm_prefork:notice] [pid 1605] AH00169: caught SIGTERM, shutting down
[Fri May 01 11:14:17.608696 2020] [mpm_prefork:notice] [pid 1685] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 mod_wsgi/4.5.17 Python/3.6 configured -- resuming normal operations
[Fri May 01 11:14:17.608770 2020] [core:notice] [pid 1685] AH00094: Command line: '/usr/sbin/apache2'
[Fri May 01 11:16:49.360625 2020] [core:notice] [pid 1685] AH00052: child pid 1696 exit signal Segmentation fault (11)
[Fri May 01 11:16:49.360697 2020] [core:notice] [pid 1685] AH00052: child pid 1717 exit signal Segmentation fault (11)
[Fri May 01 11:16:49.360708 2020] [core:notice] [pid 1685] AH00052: child pid 1719 exit signal Segmentation fault (11)
[Fri May 01 11:16:49.360724 2020] [core:notice] [pid 1685] AH00052: child pid 1722 exit signal Segmentation fault (11)
[Fri May 01 11:16:49.360780 2020] [mpm_prefork:notice] [pid 1685] AH00169: caught SIGTERM, shutting down
[Fri May 01 11:17:05.637473 2020] [mpm_prefork:notice] [pid 924] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 mod_wsgi/4.5.17 Python/3.6 configured -- resuming normal operations
[Fri May 01 11:17:05.651236 2020] [core:notice] [pid 924] AH00094: Command line: '/usr/sbin/apache2'
[Fri May 01 11:25:37.817879 2020] [core:notice] [pid 924] AH00051: child pid 946 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:25:37.817946 2020] [core:notice] [pid 924] AH00051: child pid 948 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:25:37.817960 2020] [core:notice] [pid 924] AH00051: child pid 1022 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:25:37.817972 2020] [core:notice] [pid 924] AH00051: child pid 1055 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:25:37.817984 2020] [core:notice] [pid 924] AH00051: child pid 1056 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[Fri May 01 11:25:37.818020 2020] [mpm_prefork:notice] [pid 924] AH00169: caught SIGTERM, shutting down
[Fri May 01 11:25:37.957502 2020] [mpm_prefork:notice] [pid 1394] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 mod_wsgi/4.5.17 Python/3.6 configured -- resuming normal operations
[Fri May 01 11:25:37.957577 2020] [core:notice] [pid 1394] AH00094: Command line: '/usr/sbin/apache2'

top Output:

top - 11:34:04 up 17 min,  1 user,  load average: 4.00, 3.92, 2.79
Tasks:  91 total,   5 running,  49 sleeping,   0 stopped,   0 zombie
%Cpu(s): 99.7 us,  0.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1008848 total,   112120 free,   483364 used,   413364 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   334772 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                        
 1405 www-data  20   0  561180  94488  43856 R 24.9  9.4   2:20.77 /usr/sbin/apache2 -k start                                                                                                     
 1427 www-data  20   0  550904  75304  36568 R 24.9  7.5   1:56.12 /usr/sbin/apache2 -k start                                                                                                     
 1429 www-data  20   0  552952  76684  36432 R 24.9  7.6   2:13.89 /usr/sbin/apache2 -k start                                                                                                     
 1437 www-data  20   0  550904  74748  36568 R 24.9  7.4   1:48.41 /usr/sbin/apache2 -k start                                                                                                     
  916 mysql     20   0 1410004 206444  17124 S  0.3 20.5   0:01.96 /usr/sbin/mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid                                                                 
    1 root      20   0  159764   8756   6472 S  0.0  0.9   0:01.37 /sbin/init                                                                                                                     
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [kthreadd]                                                                                                                     
    3 root      20   0       0      0      0 I  0.0  0.0   0:00.00 [kworker/0:0]                                                                                                                  
    4 root       0 -20       0      0      0 I  0.0  0.0   0:00.00 [kworker/0:0H]                                                                                                                 
    6 root       0 -20       0      0      0 I  0.0  0.0   0:00.00 [mm_percpu_wq]                                                                                                                 
    7 root      20   0       0      0      0 S  0.0  0.0   0:00.07 [ksoftirqd/0]                                                                                                                  
    8 root      20   0       0      0      0 I  0.0  0.0   0:00.08 [rcu_sched]                                                                                                                    
    9 root      20   0       0      0      0 I  0.0  0.0   0:00.00 [rcu_bh]                                                                                                                       
   10 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 [migration/0]                                                                                                                  
   11 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 [watchdog/0]                                                                                                                   
   12 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [cpuhp/0]                                                                                                                      
   13 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [kdevtmpfs]                                                                                                                    
   14 root       0 -20       0      0      0 I  0.0  0.0   0:00.00 [netns]                                                                                                                        
   15 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [rcu_tasks_kthre]                                                                                                              
   16 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [kauditd]                                                                                                                      
   17 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [khungtaskd]                                                                                                                   
   18 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [oom_reaper]                                                                                                                   
   19 root       0 -20       0      0      0 I  0.0  0.0   0:00.00 [writeback]                                                                                                                    
   20 root      20   0       0      0      0 S  0.0  0.0   0:00.00 [kcompactd0]                  

When tracing one of this PIDs with high CPU usage using strace, there's what seems like an endless stream of this:

stat("/var/www/XYZ/XYZ/web/static/css/print.css", {st_mode=S_IFREG|0664, st_size=250, ...}) = 0
sendto(24, "\264\0\0\0\3SELECT id, \n          (CASE"..., 184, MSG_DONTWAIT, NULL, 0) = 184
poll([{fd=24, events=POLLIN|POLLERR|POLLHUP}], 1, 86400000) = 1 ([{fd=24, revents=POLLIN}])
recvfrom(24, "\1\0\0\1\2/\0\0\2\3def\vXYZ\5cache\5"..., 3174, MSG_DONTWAIT, NULL, NULL) = 265
sendto(24, "<\0\0\0\3SELECT sourceId FROM docume"..., 64, MSG_DONTWAIT, NULL, 0) = 64
poll([{fd=24, events=POLLIN|POLLERR|POLLHUP}], 1, 86400000) = 1 ([{fd=24, revents=POLLIN}])
recvfrom(24, "\1\0\0\1\1]\0\0\2\3def\vXYZ\26docume"..., 2909, MSG_DONTWAIT, NULL, NULL) = 120
sendto(24, "\236\0\0\0\3SELECT id,language FROM doc"..., 162, MSG_DONTWAIT, NULL, 0) = 162
poll([{fd=24, events=POLLIN|POLLERR|POLLHUP}], 1, 86400000) = 1 ([{fd=24, revents=POLLIN}])
recvfrom(24, "\1\0\0\1\2\32\0\0\2\3def\0\0\0\2id\2id\f?\0\v\0\0\0\3!\0"..., 2789, MSG_DONTWAIT, NULL, NULL) = 95
sendto(24, "\264\0\0\0\3SELECT id, \n          (CASE"..., 184, MSG_DONTWAIT, NULL, 0) = 184
poll([{fd=24, events=POLLIN|POLLERR|POLLHUP}], 1, 86400000) = 1 ([{fd=24, revents=POLLIN}])
recvfrom(24, "\1\0\0\1\2/\0\0\2\3def\vXYZ\5cache\5"..., 2694, MSG_DONTWAIT, NULL, NULL) = 265
sendto(24, "\271\0\0\0\3SELECT id, \n          (CASE"..., 189, MSG_DONTWAIT, NULL, 0) = 189
poll([{fd=24, events=POLLIN|POLLERR|POLLHUP}], 1, 86400000) = 1 ([{fd=24, revents=POLLIN}])
recvfrom(24, "\1\0\0\1\2/\0\0\2\3def\vXYZ\5cache\5"..., 2429, MSG_DONTWAIT, NULL, NULL) = 104
access("/var/www/XYZ/XYZ/app/views/content/default.html.php", F_OK) = -1 ENOENT (No such file or directory)
access("/var/www/XYZ/XYZ/app/Resources/views/content/default.html.php", F_OK) = -1 ENOENT (No such file or directory)
access("/var/www/XYZ/XYZ/src/AppBundle/Resources/public/areas/blockquote/icon.png", F_OK) = -1 ENOENT (No such file or directory)
access("/var/www/XYZ/XYZ/src/AppBundle/Resources/public/areas/horizontal-line/icon.png", F_OK) = -1 ENOENT (No such file or directory)
access("/var/www/XYZ/XYZ/src/AppBundle/Resources/public/areas/gallery-single-images/icon.png", F_OK) = -1 ENOENT (No such file or directory)
...
Nik Isschick
  • 31
  • 1
  • 3
  • Make and enable a 4GB swap space for your OS. Better to run slow for a while than be 'broken'. Do you know why files are missing in the STRACE? Can you get them back on line? Max connections per child would probably be fine at 30 rather than 3000. From MySQL command prompt, SHOW GLOBAL STATUS LIKE 'max_used_connections'; multiply the result by 3 and use for Apache max clients. – Wilson Hauck May 02 '20 at 00:48
  • Additional information request. Any SSD or NVME devices on MySQL Host server? Post on pastebin.com and share the links. From your SSH login root, Text results of: B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) complete MySQLTuner report AND Optional very helpful information, if available includes - htop OR top for most active apps, ulimit -a for a Linux/Unix list of limits, iostat -xm 5 3 for IOPS by device and core/cpu count, for server workload tuning analysis to provide suggestions. – Wilson Hauck May 02 '20 at 00:54
  • Have you verified with SHOW CREATE TABLE tbl_name; your indexes are on your tables on the new server, like they were on the old server? – Wilson Hauck May 02 '20 at 00:59
  • 1
    Hi Wilson, thanks for your help! Here are the requested text results: B) [global status](https://pastebin.com/KWgtzejJ) C) [global variables](https://pastebin.com/8dscmsEQ) D) [proccess list](https://pastebin.com/hTUNivSg) E) [mysqltuner + optional requests](https://pastebin.com/ykpFXrZd) – Nik Isschick May 02 '20 at 07:59
  • Thanks, Suggestions were posted in Answer on May 2, 20 about 16:30. – Wilson Hauck May 13 '20 at 16:07

3 Answers3

1

Ok, so in the end I was able to track down the problem.

Some crawlers for popular SEO tools were requesting non-existent URLs at a very high frequency, resulting in some processes being triggered within Pimcore. That's what was driving up CPU and RAM consumption.

After blocking those crawlers via .htaccess, everything is back to normal.

Nik Isschick
  • 31
  • 1
  • 3
0

Thanks for posting your data. Rate Per Second = RPS

Suggestions to consider for your Digital Ocean my.cnf [mysqld] section

query_cache_type=OFF  # from ON to conserve your 1G of RAM
query_cache_size=0  # to ensure QC is not in use
innodb_io_capacity=1900  # from 200 to enable more use of SSD capacity
innodb_lru_scan_depth=100  # from 1024 to conserve 90% of CPU cycles used for function
tmp_table_size=10M  # from ~500M for 1% of RAM
max_heap_table_size=10M  # from ~500M for 1% of RAM
innodb_log_file_size=64M  # from 8M - size should always be GT innodb_log_buffer_size of 16M for you
innodb_thread_concurrency=0  # from 8 to allow auto calc of concurrency limit

with these changes, CPU use should be stabilized. Let us know your progress in a few days, please.

Wilson Hauck
  • 2,094
  • 1
  • 11
  • 19
  • @nik-isschick Have you had an opportunity to implement any of the suggestions? Please consider upvote or accept when time permits. Comment on positive or negative would be useful to the community. – Wilson Hauck May 13 '20 at 16:06
  • in the end, none of the suggestions did the trick. The problem were some crawlers very frequently requesting non-existent URLs. Somehow that led the CMS to overreact and drive up CPU and RAM consumption. Thanks again for your input though! – Nik Isschick May 14 '20 at 18:25
0

Some processes may occasionally hang using 100% CPU for some reason. You can inspect process CPU usage with tools like htop or atop (not installed by default):

sudo htop

In the case of Apache you can try to restart the service and see if that solves the problem:

restart Apache on Ubuntu

sudo service apache2 restart

In my case this didn't work, the problem process was not responding to service stop or start. I could see this by shutting down apache using:

sudo service apache2 stop

and then inspecting CPU usage with htop. The apache process was still running. Htop displays a process ID (PID). Take note of the PID of the problem process and kill the process with:

sudo kill -9 pid

where pid is the process id (a number between 2 and 65535) of the problem process.

After this, start apache:

sudo service apache2 start
anneb
  • 5,510
  • 2
  • 26
  • 26