0

I admin a mail server cluster: there're two different hosts that use a common NFS share to store maildir. Dovecot is the LDA.

CPU load on NFS is really high, although real i/o operation on the disk layer is very low.

nfsstat reports that more than 50% of query are getattr, i suspect that these queries are killing my server.

At the moment the mount options are as follow:

nfs4(rw,noatime,sync,vers=4,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,port=0,timeo=10,retrans=10,sec=sys,clientaddr=10.10.10.35,minorversion=0,local_lock=none,addr=10.10.10.28)

Dovecot storage options are as follow:

mmap_disable = yes
dotlock_use_excl = no
mail_fsync = always
mail_nfs_storage = yes
mail_nfs_index = yes

From what i've read the noac option disables the caching of attributes, leading to massive queries to the NFS server. I was thinking about enabling attribute caching but looking for infos i've found this:

getattr > 40%: The client attribute cache can be increased by setting the actimeo mount option. Note that this is not appropriate where the attributes change frequently, such as on a mail spool. In these cases, mount the filesystems with the noac option.

The problem arises here. I'm hosting a mail spool on NFS. But since ac only caches attributes and i'm using a maildir structure (opposed to mbox, which i'm certain should cause problems with ac) maybe it won't be an issue.

I'm asking for advice about this thing: is it safe to enable attribute caching on NFS serving maildir boxes to two different dovecot server?

Andrea
  • 26
  • 5
  • What about your dovecot options? – NickW Feb 12 '14 at 14:12
  • Added dovecot options, thank you in advance for the interest – Andrea Feb 12 '14 at 14:26
  • Are you using local indexes or ones on the mailstore? – NickW Feb 12 '14 at 14:51
  • Indexes are on the mailstore (nfs) – Andrea Feb 12 '14 at 14:54
  • What sort of load balancing are you doing front of the cluster? – NickW Feb 12 '14 at 14:54
  • A firewall is redirecting connections based on source host to the two mail servers. At the moment i don't have a dovecot director service running, so users are redirected randomly to one of the two servers. I've tested with just one mailserver to be sure that there were no race conditions on locks: getattr had the same ratio >50% – Andrea Feb 12 '14 at 14:59
  • If you set up the load balancer to keep sticky sessions (I'll assume you know what this means), you could try and move the indexes to the local servers, as these are definitely the most written to files by a long shot.. obviously, if you can't keep a user on a single server for as long as possible, it isn't going to help much. – NickW Feb 12 '14 at 15:02
  • @NickW thank you for your input. I'm using sticky sessions and we're serving people that use static ip addresses, so a user is kept on a single server for a long time. I'll try to move indexes on the local data store, although i'm a bit worried that the getattr queries won't go down. I was looking at my nfsstat: if i got 10K rpc queries only 100 are read/write ops. The majority of them are getattr/access calls. The fact that so many queries are made to the RPC/NFS stack seemed a good hint that the problem arise when the maildir storage is scanned. I'll post further data after doing as you said – Andrea Feb 12 '14 at 15:51
  • why you turned off all kind of attribute caching (acregmin=0,acregmax=0,acdirmin=0,acdirmax=0) ? You simply enforce NFS client to attack server with getattr requests. – kofemann Feb 13 '14 at 08:00
  • @tigran You're perfectly right. After some research i've read that was the right (best practice?) way to mount an nfs share for mail spool. Are you're telling me that is safe to enable the caching on this kind of configuration? – Andrea Feb 13 '14 at 09:39
  • @Andrea I have no experience with mail+nfs, but in general, with acxxx you steer how quick changes on the server should be visible on client. This is in case, if changes from one client should be visible on an other client. If mail server is the only client which does the updates and should see them, then it should be safe to have some caching there. – kofemann Feb 13 '14 at 10:22

0 Answers0