Running linux and executing a bunch of PHP scripts. I noticed that our CPU usage was pretty low but many processes were sleeping, so started investigating. AWS EC2 with an S3 bucket.
I can't seem to find any bottleneck but maybe I'm interpreting the data wrong. Things seem to be running properly from what I posted below (at least the active threads look ok) but a lot of these are still in the S
state. They eventually get taken care of but we would like everything to run faster and take full advantage of resources.
Any help would be greatly appreciated. Even just pointers in a specific direction. Thanks!!
Number of CPUs (nproc):
36
Load Averages (w):
18:42:32 up 106 days, 4:46, 2 users, load average: 6.26, 7.63, 8.42
Memory (cat /proc/meminfo):
MemTotal: 61837284 kB
MemFree: 3982024 kB
Buffers: 10328 kB
Cached: 32626956 kB
SwapCached: 9460 kB
Active: 42867976 kB
Inactive: 13606444 kB
Active(anon): 22581328 kB
Inactive(anon): 1259760 kB
Active(file): 20286648 kB
Inactive(file): 12346684 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 6775804 kB
SwapFree: 2165592 kB
Dirty: 16952 kB
Writeback: 60 kB
AnonPages: 23829904 kB
Mapped: 24808 kB
Shmem: 3236 kB
Slab: 749544 kB
SReclaimable: 541536 kB
SUnreclaim: 208008 kB
KernelStack: 8792 kB
PageTables: 160316 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 37694444 kB
Committed_AS: 39910116 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 131876 kB
VmallocChunk: 34328119820 kB
HardwareCorrupted: 0 kB
AnonHugePages: 550912 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 92160 kB
DirectMap2M: 4102144 kB
DirectMap1G: 58720256 kB
Listing the processes (ps -ax -o pid,s,cmd,wchan=WIDE-WCHAN-COLUMN | grep php):
2136 S /bin/sh -c php /var/www/dom wait
2156 S php /var/www/domains/vps.de poll_schedule_timeout
5831 S /bin/sh -c php /var/www/dom wait
5878 S php sync.php 110004255 --ca poll_schedule_timeout
5888 S php sync.php 11001587 --cal hrtimer_nanosleep
9138 S /bin/sh -c php /var/www/dom wait
9174 S php /var/www/builds/product poll_schedule_timeout
9243 R php sync.php 11001795 --cal -
9253 S php sync.php 110005751 --ca poll_schedule_timeout
13480 S /bin/sh -c php /var/www/dom wait
13684 S php sync.php 18003496 --cal poll_schedule_timeout
14825 S /bin/sh -c php /var/www/dom wait
17323 S /bin/sh -c php /var/www/dom wait
17385 S php sync.php 110005518 --ca poll_schedule_timeout
17391 S php sync.php 110004168 --ca pipe_wait
17393 S php sync.php 110006890 --ca poll_schedule_timeout
18479 S /bin/sh -c php /var/www/dom wait
18491 S php /var/www/domains/vps.de poll_schedule_timeout
19563 S php cron-new.php --auto-syn poll_schedule_timeout
19957 S /bin/sh -c php /var/www/dom wait
20004 S php sync.php 11001211 --cal poll_schedule_timeout
20006 R php sync.php 110004925 --ca -
20024 S php sync.php 11001046 --cal poll_schedule_timeout
20030 S php sync.php 11001517 --cal poll_schedule_timeout
21901 S /bin/sh -c php /var/www/dom wait
22004 S php sync.php 11002052 --cal poll_schedule_timeout
22006 S php sync.php 11001088 --cal pipe_wait
22008 S php sync.php 18002964 --cal hrtimer_nanosleep
22010 S php sync.php 11001069 --cal pipe_wait
That isn't the full list. On average 5-10 are in the R
state and the rest are in the S
state. for a total of about 80 processes
Checking active threads (psn -p "php" -a -G syscall,wchan,kstack) :
=== Active Threads ==================================================================================================================================================================================================================================================================================================================================================================================
samples | avg_threads | comm | state | syscall | wchan | filenamesum | kstack
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2910 | 29.10 | (php) | Sleep (Interruptible) | nanosleep | hrtimer_nanosleep | | system_call_fastpath()->SyS_nanosleep()->hrtimer_nanosleep()
2207 | 22.07 | (php) | Sleep (Interruptible) | select | poll_schedule_timeout | | system_call_fastpath()->SyS_select()->core_sys_select()->do_select()->poll_schedule_timeout()
646 | 6.46 | (php) | Running (ON CPU) | [running] | 0 | | -
350 | 3.50 | (php) | Sleep (Interruptible) | poll | poll_schedule_timeout | | system_call_fastpath()->SyS_poll()->do_sys_poll()->poll_schedule_timeout()
7 | 0.07 | (php) | Running (ON CPU) | [running] | 0 | | system_call_fastpath()->SyS_read()
6 | 0.06 | (php) | Running (ON CPU) | [running] | 0 | | retint_careful()
2 | 0.02 | (php) | Disk (Uninterruptible) | rename | sleep_on_page | | system_call_fastpath()->SyS_rename()->SYSC_renameat()->vfs_rename()->nfs_rename()->nfs4_inode_return_delegation()->nfs_wb_all()->sync_inode()->writeback_single_inode()->__writeback_single_inode()->filemap_fdatawait()->filemap_fdatawait_range()->wait_on_page_bit()->sleep_on_page()
2 | 0.02 | (php) | Running (ON CPU) | [running] | 0 | | page_remove_rmap()->mem_cgroup_uncharge_page()
1 | 0.01 | (php) | Disk (Uninterruptible) | open | rpc_wait_bit_killable | | system_call_fastpath()->SyS_open()->do_sys_open()->do_filp_open()->path_openat()->do_last()->nfs_atomic_open()->nfs4_atomic_open()->nfs4_do_open()->nfs4_run_open_task()->__rpc_wait_for_completion_task()->rpc_wait_bit_killable()
1 | 0.01 | (php) | Running (ON CPU) | [running] | poll_schedule_timeout | | -
1 | 0.01 | (php) | Running (ON CPU) | poll | poll_schedule_timeout | | system_call_fastpath()->SyS_poll()->do_sys_poll()->poll_schedule_timeout()
1 | 0.01 | (php) | Running (ON CPU) | select | 0oll_schedule_timeout | | system_call_fastpath()->SyS_select()->core_sys_select()->do_select()->poll_schedule_timeout()
1 | 0.01 | (php) | Running (ON CPU) | select | poll_schedule_timeout | | system_call_fastpath()->SyS_select()->core_sys_select()->do_select()->poll_schedule_timeout()
1 | 0.01 | (php) | Sleep (Interruptible) | [running] | poll_schedule_timeout | | -
1 | 0.01 | (php) | Sleep (Interruptible) | [running] | poll_schedule_timeout | | system_call_fastpath()->SyS_poll()->do_sys_poll()->poll_schedule_timeout()