0

Running linux and executing a bunch of PHP scripts. I noticed that our CPU usage was pretty low but many processes were sleeping, so started investigating. AWS EC2 with an S3 bucket.

I can't seem to find any bottleneck but maybe I'm interpreting the data wrong. Things seem to be running properly from what I posted below (at least the active threads look ok) but a lot of these are still in the S state. They eventually get taken care of but we would like everything to run faster and take full advantage of resources.

Any help would be greatly appreciated. Even just pointers in a specific direction. Thanks!!

Number of CPUs (nproc):

36

Load Averages (w):

18:42:32 up 106 days,  4:46,  2 users,  load average: 6.26, 7.63, 8.42

Memory (cat /proc/meminfo):

MemTotal:       61837284 kB
MemFree:         3982024 kB
Buffers:           10328 kB
Cached:         32626956 kB
SwapCached:         9460 kB
Active:         42867976 kB
Inactive:       13606444 kB
Active(anon):   22581328 kB
Inactive(anon):  1259760 kB
Active(file):   20286648 kB
Inactive(file): 12346684 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6775804 kB
SwapFree:        2165592 kB
Dirty:             16952 kB
Writeback:            60 kB
AnonPages:      23829904 kB
Mapped:            24808 kB
Shmem:              3236 kB
Slab:             749544 kB
SReclaimable:     541536 kB
SUnreclaim:       208008 kB
KernelStack:        8792 kB
PageTables:       160316 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    37694444 kB
Committed_AS:   39910116 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      131876 kB
VmallocChunk:   34328119820 kB
HardwareCorrupted:     0 kB
AnonHugePages:    550912 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       92160 kB
DirectMap2M:     4102144 kB
DirectMap1G:    58720256 kB

Listing the processes (ps -ax -o pid,s,cmd,wchan=WIDE-WCHAN-COLUMN | grep php):

2136 S /bin/sh -c php /var/www/dom wait
  2156 S php /var/www/domains/vps.de poll_schedule_timeout
  5831 S /bin/sh -c php /var/www/dom wait
  5878 S php sync.php 110004255 --ca poll_schedule_timeout
  5888 S php sync.php 11001587 --cal hrtimer_nanosleep
  9138 S /bin/sh -c php /var/www/dom wait
  9174 S php /var/www/builds/product poll_schedule_timeout
  9243 R php sync.php 11001795 --cal -
  9253 S php sync.php 110005751 --ca poll_schedule_timeout
 13480 S /bin/sh -c php /var/www/dom wait
 13684 S php sync.php 18003496 --cal poll_schedule_timeout
 14825 S /bin/sh -c php /var/www/dom wait
 17323 S /bin/sh -c php /var/www/dom wait
 17385 S php sync.php 110005518 --ca poll_schedule_timeout
 17391 S php sync.php 110004168 --ca pipe_wait
 17393 S php sync.php 110006890 --ca poll_schedule_timeout
 18479 S /bin/sh -c php /var/www/dom wait
 18491 S php /var/www/domains/vps.de poll_schedule_timeout
 19563 S php cron-new.php --auto-syn poll_schedule_timeout
 19957 S /bin/sh -c php /var/www/dom wait
 20004 S php sync.php 11001211 --cal poll_schedule_timeout
 20006 R php sync.php 110004925 --ca -
 20024 S php sync.php 11001046 --cal poll_schedule_timeout
 20030 S php sync.php 11001517 --cal poll_schedule_timeout
 21901 S /bin/sh -c php /var/www/dom wait
 22004 S php sync.php 11002052 --cal poll_schedule_timeout
 22006 S php sync.php 11001088 --cal pipe_wait
 22008 S php sync.php 18002964 --cal hrtimer_nanosleep
 22010 S php sync.php 11001069 --cal pipe_wait

That isn't the full list. On average 5-10 are in the R state and the rest are in the S state. for a total of about 80 processes

Checking active threads (psn -p "php" -a -G syscall,wchan,kstack) :

=== Active Threads ==================================================================================================================================================================================================================================================================================================================================================================================

 samples | avg_threads | comm  | state                  | syscall   | wchan                 | filenamesum | kstack
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    2910 |       29.10 | (php) | Sleep (Interruptible)  | nanosleep | hrtimer_nanosleep     |             | system_call_fastpath()->SyS_nanosleep()->hrtimer_nanosleep()
    2207 |       22.07 | (php) | Sleep (Interruptible)  | select    | poll_schedule_timeout |             | system_call_fastpath()->SyS_select()->core_sys_select()->do_select()->poll_schedule_timeout()
     646 |        6.46 | (php) | Running (ON CPU)       | [running] | 0                     |             | -
     350 |        3.50 | (php) | Sleep (Interruptible)  | poll      | poll_schedule_timeout |             | system_call_fastpath()->SyS_poll()->do_sys_poll()->poll_schedule_timeout()
       7 |        0.07 | (php) | Running (ON CPU)       | [running] | 0                     |             | system_call_fastpath()->SyS_read()
       6 |        0.06 | (php) | Running (ON CPU)       | [running] | 0                     |             | retint_careful()
       2 |        0.02 | (php) | Disk (Uninterruptible) | rename    | sleep_on_page         |             | system_call_fastpath()->SyS_rename()->SYSC_renameat()->vfs_rename()->nfs_rename()->nfs4_inode_return_delegation()->nfs_wb_all()->sync_inode()->writeback_single_inode()->__writeback_single_inode()->filemap_fdatawait()->filemap_fdatawait_range()->wait_on_page_bit()->sleep_on_page()
       2 |        0.02 | (php) | Running (ON CPU)       | [running] | 0                     |             | page_remove_rmap()->mem_cgroup_uncharge_page()
       1 |        0.01 | (php) | Disk (Uninterruptible) | open      | rpc_wait_bit_killable |             | system_call_fastpath()->SyS_open()->do_sys_open()->do_filp_open()->path_openat()->do_last()->nfs_atomic_open()->nfs4_atomic_open()->nfs4_do_open()->nfs4_run_open_task()->__rpc_wait_for_completion_task()->rpc_wait_bit_killable()
       1 |        0.01 | (php) | Running (ON CPU)       | [running] | poll_schedule_timeout |             | -
       1 |        0.01 | (php) | Running (ON CPU)       | poll      | poll_schedule_timeout |             | system_call_fastpath()->SyS_poll()->do_sys_poll()->poll_schedule_timeout()
       1 |        0.01 | (php) | Running (ON CPU)       | select    | 0oll_schedule_timeout |             | system_call_fastpath()->SyS_select()->core_sys_select()->do_select()->poll_schedule_timeout()
       1 |        0.01 | (php) | Running (ON CPU)       | select    | poll_schedule_timeout |             | system_call_fastpath()->SyS_select()->core_sys_select()->do_select()->poll_schedule_timeout()
       1 |        0.01 | (php) | Sleep (Interruptible)  | [running] | poll_schedule_timeout |             | -
       1 |        0.01 | (php) | Sleep (Interruptible)  | [running] | poll_schedule_timeout |             | system_call_fastpath()->SyS_poll()->do_sys_poll()->poll_schedule_timeout()
D.Mill
  • 379
  • 5
  • 15
  • Nothing looks out of place here except for the 106 days uptime, which is probably too long and suggests you may not be keeping up with security updates. – Michael Hampton Feb 25 '21 at 19:10
  • @MichaelHampton thanks for the prompt response. I added memory to my post on the off-chance? Are you saying that it's normal for all of these processes to be in S state? Or that nothing here points to a reason why they would be. – D.Mill Feb 25 '21 at 19:16
  • Any time a process is waiting for something other than uninterruptible I/O such as disk, it could be in S state. It's normal that most processes on any given system will be in this state. – Michael Hampton Feb 25 '21 at 19:26
  • @MichaelHampton I can slearly see these processes not requiring any cpu though. They're just there sitting doing nothing until the 5 running finish. I'm trying to figure out why it isn't running all of them in parallel since the resources seem to be there and avaialble – D.Mill Feb 25 '21 at 20:03
  • Is there some mutex in your code that you haven't mentioned yet? – Michael Hampton Feb 25 '21 at 22:40

0 Answers0