Linux: CIFS/Samba mount hangs for several minutes

Question

I have a small local network which has a Gentoo box and a Windows box. I mount a share originating on the Windows box onto the Gentoo box with a command like:

mount -t cifs -o username=WindowsUsername,password=thepassword,uid=pistos //192.168.0.103/Users /mnt/windowsbox

Most of the time, everything Just Works, and I can read and write without problems. However, every few weeks or so, the connection or the mount point seems to go dead or hang, such that any process that tries to access the mount point gets stuck in D state (disk, or I/O wait). These processes become impervious to TERM and KILL signals. Disconnecting and reconnecting the Windows box from the network does not help. The frozen state lasts for 5+ minutes. It's really frustrating and gets in the way of normal work, because it freezes Save As dialogues, ls commands, etc. If I issue a umount on the mount point, it either hangs also, or reports that the mount point is in use. Eventually, the dead state resolves itself, and the mount point gets unmounted, or it becomes possible to umount with no delay.

My guess is that this happens when the connection/mount has gone idle, or when the Windows machine has been idle. I am not really sure.

Why is this happening, and what can I do to prevent it? Or how can I successfully kill these D-state processes at will?

Possibly related: CIFS mounts hang on read

@Schrute: I assume whatever defaults are on Linux (iptables?), and Windows are running. You think the firewalls are timing out connections? I'd never heard of such a thing. — Pistos, Aug 21 '14 at 03:16
I think this might be an issue of the linux box. I saw a similar issue - not with cifs and Windows - but with a mounted nfs share. Saving was not possible - I guess due to some process hanging when accessing the non-existing nfs server. This usually happend when the server crashed. — cornelinux, Aug 21 '14 at 12:44
@cornelinux: I don't think NFS and CIFS can be compared. I'd expect their behaviour to be different. — Pistos, Aug 21 '14 at 14:58
@Pistos: NFS and CIFS can not be compared, right. But the question is, how the kernel handles missing network mounts. NFS and CIFS are both mount points available via network. And the question is how the kernel reacts in regards to mounts if there are network issues. — cornelinux, Aug 21 '14 at 15:54
Pistos sometimes firewall will time out idle connections, really depends on the variables. See if any logs related can be reviewed it may help. When dealing with unexplained I/O issues, check firewall and/or content filtering. I've seen where content filtering was enabled in error and it was causing issues like this. Good luck. — Schrute, Aug 23 '14 at 15:10
My advice is to setup a ring-buffer network capture on the linux machine (i.e. tcpdump -i eth0 -C 5 -W 10 -s 0 -v -w /tmp/cifs.pcap host 192.168.0.103 - I'd also run it under screen to prevent the process terminating when you disconnect). When the problem occurs, stop the trace after a few seconds and you should at least be able to determine which side is causing the problem when reviewing the packet trace (i.e. server stops responding, session gets disconnected etc). — GeekyDeaks, Aug 27 '14 at 07:02
@GeekyDeaks: Sounds like a good idea, though I don't know how to make use of the resultant cifs.pcap file to make those determinations. — Pistos, Aug 27 '14 at 16:41
@Pistos - Wireshark is your friend! The traces can look confusing, but wireshark will decode the frames to help. You want to first eliminate the basics, like the server or client dropping the session (FIN packets), then progress onto others like server stops responding etc. If you have time there was a sharkfest video on CIFS in 2013 (https://www.youtube.com/watch?v=XbvFXSPig-w) but it's rather long :) — GeekyDeaks, Aug 27 '14 at 17:23
I found this today, which seems very relevant and helpful: http://stackoverflow.com/questions/74626/how-do-you-force-a-cifs-connection-to-unmount — Pistos, Jun 14 '15 at 04:54
FWIW, I solved my problem another way: I put Linux on that box. :) — Pistos, Feb 01 '16 at 16:19

score 11 · Accepted Answer · answered Aug 25 '14 at 05:32

11

Not sure why the problem is happening, but as a workaround, have you tried to put something like touch /mnt/windowsbox/keepalive.txt or echo "I am still alive." >/mnt/windowsbox/keepalive.txt to be run via cron every minute? That way the connection should stay active.

answered Aug 25 '14 at 05:32

Janne Pikkarainen

31,852
4
58
81

Good idea. I've put this in place, and will see what happens. – Pistos Aug 26 '14 at 04:05
2

This seems to have solved the problem, I should mention. – Pistos Dec 01 '14 at 20:54
Great to hear that! – Janne Pikkarainen Dec 02 '14 at 05:55
1

as per @Pat's answer, one could trim this from a heartbeat each minute to a heartbeat each 5 minutes (300 seconds), which would be `*/5 * * * *` in the crontab schedule – woodvi Feb 23 '16 at 00:09
1

I'm using this for now. Within 3 days, I had three separate Ubuntu Server 16 LTS machines (two physical, one VM) drop their SMB connections after a few hours of being rebooted. On startup, SMB connection is mounted without issue, but it eventually becomes unresponsive. – user38537 May 11 '17 at 20:53

score 6 · Answer 2 · edited May 23 '17 at 12:41

6

I too encounter this every few months. sudo umount -l is my workaround. https://stackoverflow.com/a/96288/2097284

edited May 23 '17 at 12:41

Community

1

answered Aug 21 '15 at 20:38

Camille Goudeseune

181
2
7

score 0 · Answer 3 · edited Sep 30 '15 at 12:46

0

Another potential answer suggested writing to a file on the mount on a regular interval via cron. I would suggest instead using the smbclient program to connect to the share and disconnect.

I wrote a bash script like this to accomplish that:

#!/bin/bash

su usernamehere -c "smbclient \\\\\\\\\\\\\\\\servernamehere\\\\\\\\sharenamehere passwordhere -c exit" >/dev/null 2>&1

This command makes a new connection to the share and then runs the exit command, immediately shutting down the connection it just established on the command line. There should be 8 slashes before the server name and 4 before the share name, as backslashes need to be escaped, and the escapes need to be escaped when inside a double quoted string. Perhaps there's a smarter way to do this, but this does seem to work.

Perhaps there is a way to make this even more reliable by making it hold the connection open for several minutes at a time, but that's a bit out of my league.

edited Sep 30 '15 at 12:46

Sobrique

3,747
2
15
36

answered Jun 05 '15 at 18:00

RedScourge

147
5

Interesting proposal. I'd have given it a go if I didn't already have success with the other solution. – Pistos Jun 05 '15 at 22:59
1

I don't see how this would be helpful? Janne's solution would be keeping the connection made by the cifs client alive whereas this would be creating a new, unrelated connection with the smbclient - so how would it help? – flungo Feb 05 '16 at 22:46
2

FYI, smbclient supports forward slashes if you want to use them instead of backslashes, so `//servername/sharname` is way easier in places where you need lots of escaping. – Steve Friedl Oct 22 '19 at 01:22
1

Also you can use single quotes to stop the need for excapting backslashes. – Daniel Jan 13 '20 at 06:10

Linux: CIFS/Samba mount hangs for several minutes

3 Answers3

Linked