I have a Vagrant guest I'm using to run a Symfony 2 application locally for development. In general this is working fine, however, I am regularly finding the processes lock in the 'D+' state (waiting for I/O).
eg. I try to run my unit tests:
./bin/phpunit -c app
The task launches, but then never exits. In the process list I see:
vagrant 3279 0.5 4.9 378440 101132 pts/0 D+ 02:43 0:03 php ./bin/phpunit -c app
The task is unkillable. I need to power cycle the Vagrant guest to get it back again. This seems to happen mostly with PHP command line apps (but it's also the main command line tasks I do, so it might not be relevant).
The syslog reports a hung task:
Aug 20 03:04:40 precise64 kernel: [ 6240.210396] INFO: task php:3279 blocked for more than 120 seconds.
Aug 20 03:04:40 precise64 kernel: [ 6240.211920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 20 03:04:40 precise64 kernel: [ 6240.212843] php D 0000000000000000 0 3279 3091 0x00000004
Aug 20 03:04:40 precise64 kernel: [ 6240.212846] ffff88007aa13c98 0000000000000082 ffff88007aa13c38 ffffffff810830df
Aug 20 03:04:40 precise64 kernel: [ 6240.212849] ffff88007aa13fd8 ffff88007aa13fd8 ffff88007aa13fd8 0000000000013780
Aug 20 03:04:40 precise64 kernel: [ 6240.212851] ffff88007aa9c4d0 ffff880079e596f0 ffff88007aa13c78 ffff88007fc14040
Aug 20 03:04:40 precise64 kernel: [ 6240.212853] Call Trace:
Aug 20 03:04:40 precise64 kernel: [ 6240.212859] [<ffffffff810830df>] ? queue_work+0x1f/0x30
Aug 20 03:04:40 precise64 kernel: [ 6240.212863] [<ffffffff811170e0>] ? __lock_page+0x70/0x70
Aug 20 03:04:40 precise64 kernel: [ 6240.212866] [<ffffffff8165a55f>] schedule+0x3f/0x60
Aug 20 03:04:40 precise64 kernel: [ 6240.212867] [<ffffffff8165a60f>] io_schedule+0x8f/0xd0
Aug 20 03:04:40 precise64 kernel: [ 6240.212869] [<ffffffff811170ee>] sleep_on_page+0xe/0x20
Aug 20 03:04:40 precise64 kernel: [ 6240.212871] [<ffffffff8165ae2f>] __wait_on_bit+0x5f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212873] [<ffffffff81117258>] wait_on_page_bit+0x78/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212875] [<ffffffff8108af00>] ? autoremove_wake_function+0x40/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212877] [<ffffffff8111736c>] filemap_fdatawait_range+0x10c/0x1a0
Aug 20 03:04:40 precise64 kernel: [ 6240.212882] [<ffffffff81122a01>] ? do_writepages+0x21/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212884] [<ffffffff81118da8>] filemap_write_and_wait_range+0x68/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212892] [<ffffffffa01269fe>] nfs_file_fsync+0x5e/0x130 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212896] [<ffffffff811a632b>] vfs_fsync+0x2b/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212900] [<ffffffffa01272c3>] nfs_file_flush+0x53/0x80 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212903] [<ffffffff81175d6f>] filp_close+0x3f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212905] [<ffffffff81175e72>] sys_close+0xb2/0x120
Aug 20 03:04:40 precise64 kernel: [ 6240.212907] [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b`
To provision the box, I'm sharing a local folder using:
config.vm.synced_folder "/my/local/path.dev", "/var/www", :nfs => true
Vagrant creates the following /etc/exports file on the OSX host:
# VAGRANT-BEGIN: c7d0c56a-a126-46f5-a293-605bf554bc9a
"/Users/djdrey-local/Sites/oddswop.dev" 192.168.33.101 -mapall=501:20
# VAGRANT-END: c7d0c56a-a126-46f5-a293-605bf554bc9a
Output of nfsstat on the vagrant guest
Server rpc stats:
calls badcalls badclnt badauth xdrcall
0 0 0 0 0
Client rpc stats:
calls retrans authrefrsh
87751 0 87751
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 35018 39% 1110 1% 8756 9% 19086 21% 0 0%
read write create mkdir symlink mknod
5100 5% 7059 8% 4603 5% 192 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
4962 5% 262 0% 313 0% 0 0% 0 0% 1056 1%
fsstat fsinfo pathconf commit
1 0% 2 0% 1 0% 229 0%
I've ensured the Guest Additions are up to date on the guest using the plugin: vagrant-vbguest
I'm not sure how to go about debugging this. It's pretty clear to me this is a NFS issue between the guest and the Mac OSX host. If I try and up the debug logging for NFS on OSX using NFS Manager, I get a kernel panic in OSX.
Has anyone else had a similar issue? Any suggestions on a way forward would be appreciated - as power cycling the guest several times per day is unworkable.
Environment
- OSX 10.8.4
- Vagrant 1.2.7
- Virtualbox 4.2.16
- Vagrant guest O/S: Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-23-generic x86_64) [precise64.box]