My mail server setup worked for years. Recently I've started experiencing the following problem:
Mail setup: sendmail+dovecot+procmail
Host file server: CentOS 6.8, NFS exports mail directories to...
Mail server: CentOS 7.3, running as guest VM on host via libvirtd/qemu, NFS mounts /var/spool/mail from host.
Symptoms: Both dovecot and procmail have issued errors (details below) that seem to indicate they don't have permission to write to /var/spool/mail. However, /var/spool/mail has the most general permissions I know how to give, on both the NFS file server and the mail NFS client.
On the mail server (NFS client):
$ ls -lhd /var/spool/mail
drwxrwxrwt 5 root mail 6.8M Mar 29 12:37 /var/spool/mail
In mailserver:/etc/fstab:
filehost:/mail/inbox /var/spool/mail nfs defaults 0 0
On the NFS host:
$ ls -lhd /mail/inbox
drwxrwxrwt. 5 root mail 6.8M Mar 29 12:41 /mail/inbox
In filehost:/etc/exports:
/mail/inbox mailserver(rw,no_root_squash,async,nohide)
Neither system is running SELinux or iptables (I rely on our site's firewall).
The kinds of things I see:
Files with names like BOGUS.normaluser.hex-string. The corresponding log message is
Mar 29 12:14:34 mailserver procmail[20922]: Renamed bogus "/var/spool/mail/normaluser.lock" into "/var/spool/mail/BOGUS.normaluser.xGAs"
This can be exceptionally annoying, since there have been times when it's not just the lockfile that's declared bogus, but normaluser's inbox. From normaluser's perspective, their inbox vanishes as they're reading their mail.
Files with names beginning with underscores, e.g., _2-E,eu92YB.mailserver.domain.
There are no corresponding log messages. The contents of these files (which are always 1 byte or 31-33 bytes) suggest that these are lockfiles. A web page I saw yesterday described someone using strace to identify that procmail is writing these files, but I don't know how to use strace to confirm this for myself (and I can't find the page today).
When I list the files, I see that they're chmod 400, which may be why they're not being deleted:
-r-------- 1 normaluser mail 1 Mar 29 12:30 _uZF%kE-2YB.mailserver.domain -r-------- 1 normaluser mail 1 Mar 29 12:30 _uZF+kE-2YB.mailserver.domain -r-------- 1 normaluser mail 1 Mar 29 12:31 _uZF,kF-2YB.mailserver.domain -r-------- 1 normaluser mail 1 Mar 29 12:31 _uZF.kF-2YB.mailserver.domain -r-------- 1 normaluser mail 1 Mar 29 12:31 _uZF+kF-2YB.mailserver.domain
- Lockfiles that don't go away. Typical mail log entry:
Mar 29 12:31:01 mailserver dovecot: imap(normaluser): Error: unlink(/var/spool/mail/normaluser.lock) failed: Operation not permitted Mar 29 12:31:01 mailserver dovecot: imap(normaluser): Error: file_dotlock_create() failed with mbox file /var/spool/mail/normaluser: Operation not permitted
For the users, a lockfile that doesn't go away means that all their mail processing halts until I manually delete the lockfile. The permissions seem normal:
-rw------- 1 normaluser theirgroup 33 Mar 29 12:30 normaluser.lock
I've played a bit with the dovecot options, based on the dovecot wiki, hoping that I've made a mistake somewhere. The current relevant values are:
mmap_disable = yes
dotlock_use_excl = yes
mail_fsync = optimized
mail_nfs_storage = no
mail_nfs_index = no
mail_privileged_group=mail
Setting mail_nfs_storage=yes doesn't seem to change anything, since that parameter (according to the dovecot wiki) has to do with multiple mail servers accessing the same directory via NFS, which is not the case here.
I've googled and fiddled, and I can't track down the issue. I'm asking for anything I've overlooked, or for suggestions for additional diagnostics I could run.
Later:
I'm getting closer to a solution. On the client mailserver:
$ cd /var/spool/mail
$ sudo -u normaluser touch test
$ sudo -u normaluser rm test
No problem.
$ sudo -u dovenull touch test
$ sudo -u dovenull rm test
rm: cannot remove ‘test’: Operation not permitted
$ ls -lh test
-rw-r--r-- 1 nobody nobody 0 Mar 31 12:03 test
Aha! The dovenull account is not allowed to do anything in the NFS-imported directory. I tried adding a dovenull account to the NFS server (with the same uid/gid), but that hasn't solved the problem:
$ sudo -u dovenull rm test
rm: cannot remove ‘test’: Operation not permitted
$ ls -lh test
-rw-r--r-- 1 dovenull dovenull 0 Mar 31 12:03 test
This feels like an idmap issue. Here are the only uncommented lines in idmap.conf on both the client and the server:
[General]
Domain = mydomain.com
[Mapping]
Nobody-User = nobody
Nobody-Group = nobody
[Translation]
Method = nsswitch
I'm close... I can feel it...
Yet later:
I can feel all I want, but that doesn't mean I have the answer. I got the dovenull account to be able to both create and delete in /var/spool/mail (it had to do with looking carefully at /etc/nssswitch.conf and realizing I had to restart NIS), but that did not solve my problem. The dovenull account doesn't write to /var/spool/mail.
I used auditctl:
auditctl -w /var/spool/mail -p war -k mail-inbox
ausearch -k mail-inbox > mail-inbox.txt
and verified that the extra .lock files and BOGUS files were being created by dovecot, and the "_" underscore files were being created by procmail. I won't bother posting the audit logs unless someone wants to see them; what they show is that the files are being created with the correct permissions (uid, gid, euid, etc.) and the deletes are unsuccessful even though the delete call is being made with those same permissions.
So what could cause a file to be created, but be unable to be deleted?