26

I have a small local network which has a Gentoo box and a Windows box. I mount a share originating on the Windows box onto the Gentoo box with a command like:

mount -t cifs -o username=WindowsUsername,password=thepassword,uid=pistos //192.168.0.103/Users /mnt/windowsbox

Most of the time, everything Just Works, and I can read and write without problems. However, every few weeks or so, the connection or the mount point seems to go dead or hang, such that any process that tries to access the mount point gets stuck in D state (disk, or I/O wait). These processes become impervious to TERM and KILL signals. Disconnecting and reconnecting the Windows box from the network does not help. The frozen state lasts for 5+ minutes. It's really frustrating and gets in the way of normal work, because it freezes Save As dialogues, ls commands, etc. If I issue a umount on the mount point, it either hangs also, or reports that the mount point is in use. Eventually, the dead state resolves itself, and the mount point gets unmounted, or it becomes possible to umount with no delay.

My guess is that this happens when the connection/mount has gone idle, or when the Windows machine has been idle. I am not really sure.

Why is this happening, and what can I do to prevent it? Or how can I successfully kill these D-state processes at will?

Possibly related: CIFS mounts hang on read

Pistos
  • 3,093
  • 6
  • 22
  • 21
  • 1
    Are any type of firewalls in use between the two machines? – Schrute Aug 20 '14 at 23:05
  • @Schrute: I assume whatever defaults are on Linux (iptables?), and Windows are running. You think the firewalls are timing out connections? I'd never heard of such a thing. – Pistos Aug 21 '14 at 03:16
  • I think this might be an issue of the linux box. I saw a similar issue - not with cifs and Windows - but with a mounted nfs share. Saving was not possible - I guess due to some process hanging when accessing the non-existing nfs server. This usually happend when the server crashed. – cornelinux Aug 21 '14 at 12:44
  • @cornelinux: I don't think NFS and CIFS can be compared. I'd expect their behaviour to be different. – Pistos Aug 21 '14 at 14:58
  • @Pistos: NFS and CIFS can not be compared, right. But the question is, how the kernel handles missing network mounts. NFS and CIFS are both mount points available via network. And the question is how the kernel reacts in regards to mounts if there are network issues. – cornelinux Aug 21 '14 at 15:54
  • @cornelinux: In this case, neither server is crashing. – Pistos Aug 22 '14 at 14:46
  • Pistos sometimes firewall will time out idle connections, really depends on the variables. See if any logs related can be reviewed it may help. When dealing with unexplained I/O issues, check firewall and/or content filtering. I've seen where content filtering was enabled in error and it was causing issues like this. Good luck. – Schrute Aug 23 '14 at 15:10
  • 1
    My advice is to setup a ring-buffer network capture on the linux machine (i.e. tcpdump -i eth0 -C 5 -W 10 -s 0 -v -w /tmp/cifs.pcap host 192.168.0.103 - I'd also run it under screen to prevent the process terminating when you disconnect). When the problem occurs, stop the trace after a few seconds and you should at least be able to determine which side is causing the problem when reviewing the packet trace (i.e. server stops responding, session gets disconnected etc). – GeekyDeaks Aug 27 '14 at 07:02
  • @GeekyDeaks: Sounds like a good idea, though I don't know how to make use of the resultant cifs.pcap file to make those determinations. – Pistos Aug 27 '14 at 16:41
  • 1
    @Pistos - Wireshark is your friend! The traces can look confusing, but wireshark will decode the frames to help. You want to first eliminate the basics, like the server or client dropping the session (FIN packets), then progress onto others like server stops responding etc. If you have time there was a sharkfest video on CIFS in 2013 (https://www.youtube.com/watch?v=XbvFXSPig-w) but it's rather long :) – GeekyDeaks Aug 27 '14 at 17:23
  • I found this today, which seems very relevant and helpful: http://stackoverflow.com/questions/74626/how-do-you-force-a-cifs-connection-to-unmount – Pistos Jun 14 '15 at 04:54
  • FWIW, I solved my problem another way: I put Linux on that box. :) – Pistos Feb 01 '16 at 16:19

3 Answers3

11

Not sure why the problem is happening, but as a workaround, have you tried to put something like touch /mnt/windowsbox/keepalive.txt or echo "I am still alive." >/mnt/windowsbox/keepalive.txt to be run via cron every minute? That way the connection should stay active.

Janne Pikkarainen
  • 31,852
  • 4
  • 58
  • 81
  • Good idea. I've put this in place, and will see what happens. – Pistos Aug 26 '14 at 04:05
  • 2
    This seems to have solved the problem, I should mention. – Pistos Dec 01 '14 at 20:54
  • Great to hear that! – Janne Pikkarainen Dec 02 '14 at 05:55
  • 1
    as per @Pat's answer, one could trim this from a heartbeat each minute to a heartbeat each 5 minutes (300 seconds), which would be `*/5 * * * *` in the crontab schedule – woodvi Feb 23 '16 at 00:09
  • 1
    I'm using this for now. Within 3 days, I had three separate Ubuntu Server 16 LTS machines (two physical, one VM) drop their SMB connections after a few hours of being rebooted. On startup, SMB connection is mounted without issue, but it eventually becomes unresponsive. – user38537 May 11 '17 at 20:53
6

I too encounter this every few months. sudo umount -l is my workaround. https://stackoverflow.com/a/96288/2097284

0

Another potential answer suggested writing to a file on the mount on a regular interval via cron. I would suggest instead using the smbclient program to connect to the share and disconnect.

I wrote a bash script like this to accomplish that:

#!/bin/bash

su usernamehere -c "smbclient \\\\\\\\\\\\\\\\servernamehere\\\\\\\\sharenamehere passwordhere -c exit" >/dev/null 2>&1

This command makes a new connection to the share and then runs the exit command, immediately shutting down the connection it just established on the command line. There should be 8 slashes before the server name and 4 before the share name, as backslashes need to be escaped, and the escapes need to be escaped when inside a double quoted string. Perhaps there's a smarter way to do this, but this does seem to work.

Perhaps there is a way to make this even more reliable by making it hold the connection open for several minutes at a time, but that's a bit out of my league.

Sobrique
  • 3,747
  • 2
  • 15
  • 36
RedScourge
  • 147
  • 5
  • Interesting proposal. I'd have given it a go if I didn't already have success with the other solution. – Pistos Jun 05 '15 at 22:59
  • 1
    I don't see how this would be helpful? Janne's solution would be keeping the connection made by the cifs client alive whereas this would be creating a new, unrelated connection with the smbclient - so how would it help? – flungo Feb 05 '16 at 22:46
  • 2
    FYI, smbclient supports forward slashes if you want to use them instead of backslashes, so `//servername/sharname` is way easier in places where you need lots of escaping. – Steve Friedl Oct 22 '19 at 01:22
  • 1
    Also you can use single quotes to stop the need for excapting backslashes. – Daniel Jan 13 '20 at 06:10