4

I have a Vagrant guest I'm using to run a Symfony 2 application locally for development. In general this is working fine, however, I am regularly finding the processes lock in the 'D+' state (waiting for I/O).

eg. I try to run my unit tests:

./bin/phpunit -c app

The task launches, but then never exits. In the process list I see:

vagrant 3279 0.5 4.9 378440 101132 pts/0 D+ 02:43 0:03 php ./bin/phpunit -c app

The task is unkillable. I need to power cycle the Vagrant guest to get it back again. This seems to happen mostly with PHP command line apps (but it's also the main command line tasks I do, so it might not be relevant).

The syslog reports a hung task:

Aug 20 03:04:40 precise64 kernel: [ 6240.210396] INFO: task php:3279 blocked for more than 120 seconds.
Aug 20 03:04:40 precise64 kernel: [ 6240.211920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 20 03:04:40 precise64 kernel: [ 6240.212843] php             D 0000000000000000     0  3279   3091 0x00000004
Aug 20 03:04:40 precise64 kernel: [ 6240.212846]  ffff88007aa13c98 0000000000000082 ffff88007aa13c38 ffffffff810830df
Aug 20 03:04:40 precise64 kernel: [ 6240.212849]  ffff88007aa13fd8 ffff88007aa13fd8 ffff88007aa13fd8 0000000000013780
Aug 20 03:04:40 precise64 kernel: [ 6240.212851]  ffff88007aa9c4d0 ffff880079e596f0 ffff88007aa13c78 ffff88007fc14040
Aug 20 03:04:40 precise64 kernel: [ 6240.212853] Call Trace:
Aug 20 03:04:40 precise64 kernel: [ 6240.212859]  [<ffffffff810830df>] ? queue_work+0x1f/0x30
Aug 20 03:04:40 precise64 kernel: [ 6240.212863]  [<ffffffff811170e0>] ? __lock_page+0x70/0x70
Aug 20 03:04:40 precise64 kernel: [ 6240.212866]  [<ffffffff8165a55f>] schedule+0x3f/0x60
Aug 20 03:04:40 precise64 kernel: [ 6240.212867]  [<ffffffff8165a60f>] io_schedule+0x8f/0xd0
Aug 20 03:04:40 precise64 kernel: [ 6240.212869]  [<ffffffff811170ee>] sleep_on_page+0xe/0x20
Aug 20 03:04:40 precise64 kernel: [ 6240.212871]  [<ffffffff8165ae2f>] __wait_on_bit+0x5f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212873]  [<ffffffff81117258>] wait_on_page_bit+0x78/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212875]  [<ffffffff8108af00>] ? autoremove_wake_function+0x40/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212877]  [<ffffffff8111736c>] filemap_fdatawait_range+0x10c/0x1a0
Aug 20 03:04:40 precise64 kernel: [ 6240.212882]  [<ffffffff81122a01>] ? do_writepages+0x21/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212884]  [<ffffffff81118da8>] filemap_write_and_wait_range+0x68/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212892]  [<ffffffffa01269fe>] nfs_file_fsync+0x5e/0x130 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212896]  [<ffffffff811a632b>] vfs_fsync+0x2b/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212900]  [<ffffffffa01272c3>] nfs_file_flush+0x53/0x80 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212903]  [<ffffffff81175d6f>] filp_close+0x3f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212905]  [<ffffffff81175e72>] sys_close+0xb2/0x120
Aug 20 03:04:40 precise64 kernel: [ 6240.212907]  [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b`

To provision the box, I'm sharing a local folder using:

config.vm.synced_folder "/my/local/path.dev", "/var/www", :nfs => true

Vagrant creates the following /etc/exports file on the OSX host:

# VAGRANT-BEGIN: c7d0c56a-a126-46f5-a293-605bf554bc9a
"/Users/djdrey-local/Sites/oddswop.dev" 192.168.33.101 -mapall=501:20
# VAGRANT-END: c7d0c56a-a126-46f5-a293-605bf554bc9a

Output of nfsstat on the vagrant guest

Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
0          0          0          0          0

Client rpc stats:
calls      retrans    authrefrsh
87751      0          87751

Client nfs v3:
null         getattr      setattr      lookup       access       readlink
0         0% 35018    39% 1110      1% 8756      9% 19086    21% 0         0%
read         write        create       mkdir        symlink      mknod
5100      5% 7059      8% 4603      5% 192       0% 0         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
4962      5% 262       0% 313       0% 0         0% 0         0% 1056      1%
fsstat       fsinfo       pathconf     commit
1         0% 2         0% 1         0% 229       0%

I've ensured the Guest Additions are up to date on the guest using the plugin: vagrant-vbguest

I'm not sure how to go about debugging this. It's pretty clear to me this is a NFS issue between the guest and the Mac OSX host. If I try and up the debug logging for NFS on OSX using NFS Manager, I get a kernel panic in OSX.

Has anyone else had a similar issue? Any suggestions on a way forward would be appreciated - as power cycling the guest several times per day is unworkable.

Environment

  • OSX 10.8.4
  • Vagrant 1.2.7
  • Virtualbox 4.2.16
  • Vagrant guest O/S: Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-23-generic x86_64) [precise64.box]
Andre Lackmann
  • 686
  • 6
  • 14
  • Could you please run the php process with strace? Maybe it helps to see what is happening. – Lajos Veres Oct 09 '13 at 11:41
  • maybe a duplicate of http://stackoverflow.com/questions/18085868/attempt-to-access-remote-folder-mounted-with-cifs-hangs-when-disconnected you should read the answer of the question. It was for cifs, but NFS is the most well know case (more than smb/cifs). There is no such problem with other versions. But, I don't think you can chose the NFS server version (4) on XNU. Again: it is the same answer than the one I wrote on the CIFS question. With many network filesystems implemented in the linux kernel side; the process become an hung task if the server became unreachable on the network. – user2284570 Oct 09 '13 at 20:13
  • @user2284570 With Vagrant - it's all on the same machine. So the NFS connection is over a virtual NIC to VirtualBox. Unlikely to be a connection problem is my assumption. – Andre Lackmann Oct 10 '13 at 02:10
  • As it is called, a virtual machine is aiming at making a if they were several physical machine that have separate hardware. All network information pass through a virtual card (on the server) which emit virtual Ethernet frame like any other Ethernet card. The only exception is provided for VM/server through **virtual box additions** in this case. And the only exception for network is aka **Virtual Box Shared folders**. Also, you have some NFS functions in your backtrace, which suppose your program (D state) is waiting for network I/O. Remember network failures can be caused by software ones. – user2284570 Oct 10 '13 at 18:44

2 Answers2

1

I had a similar problem when running npm install within a shared nfs folder and subsequently found that disabling nfs_udp fixed the hanging issues :

 config.vm.synced_folder ".", "/vagrant", type: "nfs", nfs_udp: false
Ben Dyer
  • 1,536
  • 11
  • 7
  • Fixed my problem with PHP scripts writing to the shared folder, or trying to move a recently written file from /tmp to /vagrant – Daniel Doezema Dec 22 '16 at 05:40
0

You don't give enough detail on the specific configuration (e.g., the exports file, the fstab file, firewall config, etc.) for a specific answer. Here are some ideas though:

In the fstab try adding the "hard,intr" flags to the mount options -- this makes it possible to kill processes waiting for I/O on a dead mount.

Also make sure your firewall is open for rpc calls and the rpc-statd service is running.

Also figure out what version of nfs you're running and that you have the correct TCP/UDP ports open. If NFS v4 isn't working out, maybe try NFS v3.

Finally, are you connecting via IP address or hostname? Hostname is great, but make sure it always resolves correctly -- maybe in your /etc/hosts file. Alternatively, hard-code the IP addresses so there is no chance of name resolution failing...

Steve Moon
  • 99
  • 1
  • 4
  • Thanks for the notes Steve. As the mount is created dynamically by Vagrant I don't believe their is an option to set anything in the fstab (beside OSX doesn't have one - and the guest is the NFS client in this instance). I've updated my question with the /etc/exports file contents. I don't think this is a firewall or ports issue - as iptables has an empty ruleset on the Ubuntu guest and I'm not running any f/w on OSX (and it normally works, this issue is intermittent). I've added the output of nfsstat too which suggests it's running NFSv3 – Andre Lackmann Aug 20 '13 at 04:12