2

We have upgraded our Kernel from 2.6.32 to 3.8.7 on a Debian system. We have a sharing with NFS to get the data for Apache2. And Nginx serves only static files as a proxy. Since we have installed the Kernel 3.8.7, the load average sometimes grows (to 40 and more) and we find errors written on '/var/log/messages'.

This is the common error:

Apr 17 06:07:44 node1 kernel: [116569.387483] INFO: task apache2:18604 blocked for more than 120 seconds.
Apr 17 06:07:44 node1 kernel: [116569.387527] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 06:07:44 node1 kernel: [116569.387598] apache2         D 0000000000000002     0 18604  17528 0x00000000
Apr 17 06:07:44 node1 kernel: [116569.387602]  ffff8802338f1d98 0000000000000082 ffff8802338f0010 0000000000013940
Apr 17 06:07:44 node1 kernel: [116569.387605]  ffff880222454a40 0000000000013940 ffff8802338f1fd8 0000000000013940
Apr 17 06:07:44 node1 kernel: [116569.387608]  ffff8802338f1fd8 0000000000013940 ffff880236543180 ffff880222454a40
Apr 17 06:07:44 node1 kernel: [116569.387614] Call Trace:
Apr 17 06:07:44 node1 kernel: [116569.387622]  [<ffffffff815681de>] schedule+0x64/0x66
Apr 17 06:07:44 node1 kernel: [116569.387625]  [<ffffffff81568426>] schedule_preempt_disabled+0xe/0x10
Apr 17 06:07:44 node1 kernel: [116569.387628]  [<ffffffff81567178>] __mutex_lock_common+0x11d/0x18b
Apr 17 06:07:44 node1 kernel: [116569.387633]  [<ffffffff8113dc17>] ? filename_lookup+0x74/0x84
Apr 17 06:07:44 node1 kernel: [116569.387636]  [<ffffffff81567201>] __mutex_lock_slowpath+0x1b/0x1d
Apr 17 06:07:44 node1 kernel: [116569.387639]  [<ffffffff81566fbd>] mutex_lock+0x1b/0x2c
Apr 17 06:07:44 node1 kernel: [116569.387642]  [<ffffffff8113e75d>] do_unlinkat+0x92/0x231
Apr 17 06:07:44 node1 kernel: [116569.387645]  [<ffffffff81131d6e>] ? fsnotify_access+0x5d/0x65
Apr 17 06:07:44 node1 kernel: [116569.387648]  [<ffffffff81132f13>] ? sys_read+0x81/0x8e
Apr 17 06:07:44 node1 kernel: [116569.387651]  [<ffffffff8113e912>] sys_unlink+0x16/0x18
Apr 17 06:07:44 node1 kernel: [116569.387655]  [<ffffffff815701d9>] system_call_fastpath+0x16/0x1b

So, we have changed the Kernel to 3.8.8 (released today) and we have the same errors on logs for every nodes with Apache2. But this time, we do not have the INFO line.

Apr 17 14:39:48 node4 kernel: [ 4074.194315] apache2         D 0000000000000002     0  3042   2144 0x00000000
Apr 17 14:39:48 node4 kernel: [ 4074.194319]  ffff880227833d98 0000000000000086 ffff880227832010 0000000000013940
Apr 17 14:39:48 node4 kernel: [ 4074.194323]  ffff880227838000 0000000000013940 ffff880227833fd8 0000000000013940
Apr 17 14:39:48 node4 kernel: [ 4074.194326]  ffff880227833fd8 0000000000013940 ffff8802365418c0 ffff880227838000
Apr 17 14:39:48 node4 kernel: [ 4074.194329] Call Trace:
Apr 17 14:39:48 node4 kernel: [ 4074.194338]  [<ffffffff815681f6>] schedule+0x64/0x66
Apr 17 14:39:48 node4 kernel: [ 4074.194341]  [<ffffffff8156843e>] schedule_preempt_disabled+0xe/0x10
Apr 17 14:39:48 node4 kernel: [ 4074.194345]  [<ffffffff81567190>] __mutex_lock_common+0x11d/0x18b
Apr 17 14:39:48 node4 kernel: [ 4074.194350]  [<ffffffff8113dc27>] ? filename_lookup+0x74/0x84
Apr 17 14:39:48 node4 kernel: [ 4074.194353]  [<ffffffff81567219>] __mutex_lock_slowpath+0x1b/0x1d
Apr 17 14:39:48 node4 kernel: [ 4074.194356]  [<ffffffff81566fd5>] mutex_lock+0x1b/0x2c
Apr 17 14:39:48 node4 kernel: [ 4074.194360]  [<ffffffff8113e76d>] do_unlinkat+0x92/0x231
Apr 17 14:39:48 node4 kernel: [ 4074.194364]  [<ffffffff81131d7e>] ? fsnotify_access+0x5d/0x65
Apr 17 14:39:48 node4 kernel: [ 4074.194367]  [<ffffffff81132f23>] ? sys_read+0x81/0x8e
Apr 17 14:39:48 node4 kernel: [ 4074.194370]  [<ffffffff8113e922>] sys_unlink+0x16/0x18
Apr 17 14:39:48 node4 kernel: [ 4074.194375]  [<ffffffff81570259>] system_call_fastpath+0x16/0x1b

How can we solve this problem? Do you have a solution?

Best regards,

Stephane

EDIT WITH MORE INFO:

At the same time, we have one other information in our Apache log.

[Sun Apr 21 16:54:33 2013] [error] [client 90.48.134.110] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-plate-forme/index2.html
[Sun Apr 21 16:54:37 2013] [error] [client 90.48.134.110] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-plate-forme/index2.html
[Sun Apr 21 16:54:40 2013] [error] [client 84.98.103.74] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:54:43 2013] [error] [client 41.140.52.31] Script timed out before returning headers: examplecom.wsgi, referer: http://www.google.com/url?sa=t&rct=j&q=jeux&source=web&cd=9&ved=0CHwQFjAI&url=http%3A%2F%2Fwww.example.com%2F&ei=nP1zUeGvOujZ0QXs5IGoCA&usg=AFQjCNHUy5HF9h1McY5VwLTLf-8mES4BtQ&bvm=bv.45512109,d.d2k
[Sun Apr 21 16:54:47 2013] [error] [client 109.210.26.71] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/liquid-measure.html
[Sun Apr 21 16:54:52 2013] [error] [client 86.202.162.244] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-strategie/index2.html
[Sun Apr 21 16:54:55 2013] [error] [client 77.194.131.243] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-reflexion/index31.html
[Sun Apr 21 16:55:03 2013] [error] [client 196.217.219.247] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:04 2013] [error] [client 77.194.131.243] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-reflexion/index31.html
[Sun Apr 21 16:55:07 2013] [error] [client 89.82.161.71] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:55:09 2013] [error] [client 90.48.134.110] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-plate-forme/index5.html
[Sun Apr 21 16:55:10 2013] [error] [client 80.9.135.15] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-simulation/index22.html
[Sun Apr 21 16:55:11 2013] [error] [client 5.49.161.243] Script timed out before returning headers: examplecom.wsgi, referer: https://www.google.fr/
[Sun Apr 21 16:55:12 2013] [error] [client 188.44.65.194] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:55:16 2013] [error] [client 77.194.131.243] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-reflexion/index33.html
[Sun Apr 21 16:55:18 2013] [error] [client 109.128.202.223] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/sports-heads-basketball.html
[Sun Apr 21 16:55:26 2013] [error] [client 66.249.73.147] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:27 2013] [error] [client 88.123.238.16] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:30 2013] [error] [client 89.82.161.71] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:55:31 2013] [error] [client 89.82.161.71] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-action/index1.html
[Sun Apr 21 16:55:35 2013] [error] [client 84.98.103.74] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:38 2013] [error] [client 173.199.120.51] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:40 2013] [error] [client 31.38.95.42] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-aventure/index12.html
[Sun Apr 21 16:55:41 2013] [error] [client 78.248.174.131] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:46 2013] [error] [client 78.223.64.116] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:55:49 2013] [error] [client 173.176.47.26] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-cine-tv/index1.html
[Sun Apr 21 16:55:50 2013] [error] [client 82.249.237.18] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:51 2013] [error] [client 23.20.240.42] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:55:52 2013] [error] [client 173.176.47.26] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-cine-tv/index1.html
[Sun Apr 21 16:55:53 2013] [error] [client 88.123.238.16] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:55:53 2013] [error] [client 86.202.162.244] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-strategie/index6.html
[Sun Apr 21 16:55:59 2013] [error] [client 41.200.118.149] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-aventure/index1.html
[Sun Apr 21 16:56:01 2013] [error] [client 78.248.174.131] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:05 2013] [error] [client 90.19.120.106] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:56:06 2013] [error] [client 91.178.96.209] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:56:08 2013] [error] [client 41.140.52.31] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:56:10 2013] [error] [client 90.48.134.110] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-plate-forme/index14.html
[Sun Apr 21 16:56:11 2013] [error] [client 78.248.174.131] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:19 2013] [error] [client 173.199.120.51] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:19 2013] [error] [client 2.12.210.112] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/billiard-blitz-hustle.html
[Sun Apr 21 16:56:21 2013] [error] [client 157.55.33.24] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:23 2013] [error] [client 91.178.96.209] Script timed out before returning headers: examplecom.wsgi, referer: http://www.google.be/search?hl=fr-BE&source=hp&q=jeux&gbv=2&rlz=1W1SKPB_fr&oq=j&gs_l=heirloom-hp.1.0.0l10.2704.2704.0.4782.1.1.0.0.0.0.125.125.0j1.1.0...0.0...1ac.1.IfSicVdYv6I
[Sun Apr 21 16:56:24 2013] [error] [client 90.19.120.106] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/bubble-shooter.html
[Sun Apr 21 16:56:26 2013] [error] [client 86.70.179.202] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-aventure/index3.html
[Sun Apr 21 16:56:27 2013] [error] [client 173.176.47.26] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-puzzles/index1.html
[Sun Apr 21 16:56:28 2013] [error] [client 41.140.52.31] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:29 2013] [error] [client 78.223.64.116] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/historique/
[Sun Apr 21 16:56:34 2013] [error] [client 91.178.96.209] Script timed out before returning headers: examplecom.wsgi, referer: http://www.google.be/search?hl=fr-BE&source=hp&q=jeux&gbv=2&rlz=1W1SKPB_fr&oq=j&gs_l=heirloom-hp.1.0.0l10.2704.2704.0.4782.1.1.0.0.0.0.125.125.0j1.1.0...0.0...1ac.1.IfSicVdYv6I
[Sun Apr 21 16:56:36 2013] [error] [client 105.137.9.237] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/papas-hot-doggeria.html
[Sun Apr 21 16:56:41 2013] [error] [client 77.194.131.243] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-reflexion/index40.html
[Sun Apr 21 16:56:42 2013] [error] [client 41.224.155.145] Script timed out before returning headers: examplecom.wsgi, referer: https://www.google.tn/
[Sun Apr 21 16:56:45 2013] [error] [client 41.140.52.31] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:45 2013] [error] [client 184.73.108.145] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:51 2013] [error] [client 171.16.210.1] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux/
[Sun Apr 21 16:56:51 2013] [error] [client 77.206.105.160] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/tu-95.html
[Sun Apr 21 16:56:54 2013] [error] [client 157.55.32.184] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:56:55 2013] [error] [client 171.16.210.1] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux/
[Sun Apr 21 16:56:57 2013] [error] [client 41.224.155.145] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:57:00 2013] [error] [client 86.71.134.240] Script timed out before returning headers: examplecom.wsgi, referer: http://www1.delta-search.com/?q=tankiste+jeux&s=web&as=3&rlz=0&babsrc=HP_ss
[Sun Apr 21 16:57:01 2013] [error] [client 41.143.152.123] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/web-action/
[Sun Apr 21 16:57:04 2013] [error] [client 41.140.52.31] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:07 2013] [error] [client 96.31.66.245] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:10 2013] [error] [client 77.194.131.243] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-reflexion/index43.html
[Sun Apr 21 16:57:11 2013] [error] [client 41.226.161.210] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:15 2013] [error] [client 66.249.73.147] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:23 2013] [error] [client 92.143.167.234] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:26 2013] [error] [client 77.194.131.243] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-reflexion/index47.html
[Sun Apr 21 16:57:32 2013] [error] [client 41.226.161.210] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:33 2013] [error] [client 82.230.45.233] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:37 2013] [error] [client 84.100.172.245] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-divers/index14.html
[Sun Apr 21 16:57:37 2013] [error] [client 93.6.87.57] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:57:38 2013] [error] [client 178.237.80.156] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-simulation/index6.html
[Sun Apr 21 16:57:38 2013] [error] [client 46.126.133.25] Script timed out before returning headers: examplecom.wsgi, referer: http://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&ved=0CG4QFjAH&url=http%3A%2F%2Fwww.example.com%2F&ei=DP5zUcC8IIG44AT21oHoAQ&usg=AFQjCNHUy5HF9h1McY5VwLTLf-8mES4BtQ&bvm=bv.45512109,d.bGE
[Sun Apr 21 16:57:46 2013] [error] [client 79.80.168.112] Script timed out before returning headers: examplecom.wsgi, referer: http://files.example.com/278374/1769/1769.swf?201301111413
[Sun Apr 21 16:57:49 2013] [error] [client 89.158.158.137] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:57:49 2013] [error] [client 41.226.161.210] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:50 2013] [error] [client 41.140.115.198] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:57:53 2013] [error] [client 197.6.124.156] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/tous-les-jeux/index9.html
[Sun Apr 21 16:57:56 2013] [error] [client 41.226.161.210] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:57:58 2013] [error] [client 66.249.73.147] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:01 2013] [error] [client 208.78.85.9] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:04 2013] [error] [client 79.80.168.112] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:58:08 2013] [error] [client 178.154.243.93] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:10 2013] [error] [client 65.55.213.67] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:13 2013] [error] [client 2.0.117.243] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:15 2013] [error] [client 93.6.87.57] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/sports-heads-football-championship.html
[Sun Apr 21 16:58:21 2013] [error] [client 84.100.172.245] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-divers/index9.html
[Sun Apr 21 16:58:21 2013] [error] [client 41.141.31.247] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:58:22 2013] [error] [client 90.46.243.124] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/da-vincis-flying-robots.html
[Sun Apr 21 16:58:22 2013] [error] [client 157.55.33.24] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:23 2013] [error] [client 41.141.192.75] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeux-sport/index1.html
[Sun Apr 21 16:58:25 2013] [error] [client 78.242.116.153] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:58:28 2013] [error] [client 50.16.125.173] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:30 2013] [error] [client 109.128.202.223] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/sports-heads-football-championship.html
[Sun Apr 21 16:58:32 2013] [error] [client 96.23.78.172] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:34 2013] [error] [client 85.169.207.226] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/
[Sun Apr 21 16:58:37 2013] [error] [client 82.123.227.217] Script timed out before returning headers: examplecom.wsgi, referer: http://www.example.com/jeu/liquid-measure-3.html
[Sun Apr 21 16:58:37 2013] [error] [client 197.2.87.84] Script timed out before returning headers: examplecom.wsgi, referer: https://www.google.tn/
[Sun Apr 21 16:58:41 2013] [error] [client 65.55.213.67] Script timed out before returning headers: examplecom.wsgi
[Sun Apr 21 16:58:42 2013] [error] [client 197.27.82.96] Script timed out before returning headers: examplecom.wsgi, referer: https://www.google.tn/
[Sun Apr 21 16:58:47 2013] [error] [client 197.2.105.198] Script timed out before returning headers: examplecom.wsgi, referer: http://www.google.tn/url?sa=t&rct=j&q=jeux&source=web&cd=8&cad=rja&ved=0CGgQFjAH&url=http%3A%2F%2Fwww.example.com%2F&ei=kf5zUYOMMITNswbe24DwAg&usg=AFQjCNHUy5HF9h1McY5VwLTLf-8mES4BtQ&bvm=bv.45512109,d.ZWU
[Sun Apr 21 17:02:00 2013] [error] [client 79.80.168.112] (4)Interrupted system call: mod_wsgi (pid=24177): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 66.249.73.147] (4)Interrupted system call: mod_wsgi (pid=23061): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 81.52.143.33] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 88.186.156.156] (4)Interrupted system call: mod_wsgi (pid=26739): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://search.free.fr/google.pl
[Sun Apr 21 17:02:00 2013] [error] [client 41.97.87.84] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.govome.com/web?hl=dz&q=jeux
[Sun Apr 21 17:02:00 2013] [error] [client 109.29.30.203] (4)Interrupted system call: mod_wsgi (pid=26739): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 173.199.120.51] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 41.140.234.23] (4)Interrupted system call: mod_wsgi (pid=26739): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 86.72.195.6] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/jeux-action/index72.html
[Sun Apr 21 17:02:00 2013] [error] [client 41.251.99.18] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/jeux-combat/index1.html
[Sun Apr 21 17:02:00 2013] [error] [client 109.12.49.236] (4)Interrupted system call: mod_wsgi (pid=26739): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 41.251.99.18] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/jeux-action/index23.html
[Sun Apr 21 17:02:00 2013] [error] [client 81.52.143.30] (4)Interrupted system call: mod_wsgi (pid=26739): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 41.250.167.182] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 83.141.225.0] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/jeux-course/index10.html
[Sun Apr 21 17:02:00 2013] [error] [client 88.179.62.177] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 65.55.213.77] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 78.239.240.30] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/jeux-action/index2.html
[Sun Apr 21 17:02:00 2013] [error] [client 109.12.49.236] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 81.52.143.31] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 82.145.216.39] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 86.215.16.69] (4)Interrupted system call: mod_wsgi (pid=26380): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 50.16.125.173] (4)Interrupted system call: mod_wsgi (pid=27142): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 82.231.128.20] (4)Interrupted system call: mod_wsgi (pid=27142): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/
[Sun Apr 21 17:02:00 2013] [error] [client 90.48.134.110] (4)Interrupted system call: mod_wsgi (pid=27142): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/jeux-plate-forme/index37.html
[Sun Apr 21 17:02:00 2013] [error] [client 65.55.213.67] (4)Interrupted system call: mod_wsgi (pid=27142): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts.
[Sun Apr 21 17:02:00 2013] [error] [client 83.141.225.0] (4)Interrupted system call: mod_wsgi (pid=27142): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.example.com/jeux-course/index10.html
[Sun Apr 21 17:02:05 2013] [error] [client 84.102.194.98] (4)Interrupted system call: mod_wsgi (pid=24922): Unable to connect to WSGI daemon process 'www.example.com' on '/var/run/apache2/wsgi.27156.0.16.sock' after multiple attempts., referer: http://www.google.fr/url?sa=t&rct=j&q=jeux&source=web&cd=10&ved=0CHcQFjAJ&url=http%3A%2F%2Fwww.example.com%2F&ei=kv5zUZruFYyrhAf7goHgBQ&usg=AFQjCNHUy5HF9h1McY5VwLTLf-8mES4BtQ

It seems that WSGI (not always the same) blocks the Apache2. All files are on NFS with Ext4 today (before it was ReiserFS, but there is no change). But with old Kernel (2.6.32), we had not this problem, only with new one (3.8.8). If we kill the blocked process, the load average returns to normal state. We have changed the WSGI version to the last one, but it is the same result.

Is it NFS server which locks files? Or do you have an other idea?

Acti67
  • 121
  • 1
  • 4
  • I can't get around this. Instead of NFS I have autofs with a mounted Windows share. Did you find a fix? – oxygen Dec 08 '14 at 15:08

2 Answers2

1

We can see from the call trace that Apache tried to delete a file, and got hung up waiting for a lock to be released. Since the files are on an NFS server, you should be looking at the NFS server and the network connection between the web server and the NAS.

One thing you may want to make sure of is that you have explicitly specified to use NFSv4 on both ends. It is much more reliable than NFSv3 and solves a lot of problems that used to plague previous NFS versions. The server should already be doing this if it's, e.g. RHEL 6 or later. It just remains to ensure that you have specified to use NFSv4 in your clients' mount options. (For instance, nfs4 filesystem type, or nfs filesystem type with nfsvers=4 mount option.)

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
0

what is the output of

# cat /proc/sys/vm/dirty_ratio

and

# free -mt

default on debian is 10: 10% of memory for caching

It is a known bug about data caching into memory, when the dirty ratio is too high, IO process can take a huge time (up to 120 sec) to flush cached memory to disk. You can try to decrease this dirty_ratio if you have a lot of memory

To change this value for testing:

echo 10 > /proc/sys/vm/dirty_ratio

To change permanently, add this to your /etc/sysctl.conf:

vm.dirty_ratio=10

EDIT: from kernel doc

Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be specified at a time. When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and the other appears as 0 when read.

You need to find the best settings, regarding of your disks performance end your memory amount

EDIT - more infos

Ok, so try to increase timeout from apache which default is 300 sec:TimeOut Directive

maxxvw
  • 321
  • 1
  • 7
  • These are the output of different commands: node1:~# cat /proc/sys/vm/dirty_ratio 20 node1:~# free -mt total used free shared buffers cached Mem: 7964 7782 182 0 31 6039 -/+ buffers/cache: 1711 6252 Swap: 11600 0 11600 Total: 19565 7782 11782 What do you think about? – Acti67 Apr 19 '13 at 20:14
  • I think you should try to decrease vm.dirty_ratio to 10 to see if this is satisfying. updated answer – maxxvw Apr 19 '13 at 20:24
  • This isn't resolve the problem... I have tried with a value of 5. Another idea? – Acti67 Apr 19 '13 at 22:00
  • Decreasing amount of dirty_bytes...? If the ratio is more than amount, it has no effect. – maxxvw Apr 20 '13 at 07:05
  • Dirty_bytes are setted to 0. Need a change? – Acti67 Apr 21 '13 at 16:47
  • It is because you setted th ratio. See my edit. Default for debian is 1024000. You could also try to change this timeout. ie: echo 240 > /proc/sys/kernel/hung_task_timeout_secs – maxxvw Apr 21 '13 at 18:30
  • I have edited my subject in order to find a solution. Dirty_ratio does not seem to be the solution. – Acti67 Apr 22 '13 at 16:23
  • We have changed the Kernel version to return on 2.6.32 on the Filer that serves the files to nodes. The filer is an NFS server and the Nodes serves the webpages and statics. With the Filer on 2.6.32 and the nodes in 3.8.8, all is working perfectly, but with 3.8.8 on the filer, we have errors on Nodes... There is surely a problem in the Kernel NFS server. – Acti67 Apr 24 '13 at 12:28