I'm using Puppet to manage some files that are shared between servers, by way of the GlusterFS file system. (The specifics shouldn't matter, but in this case things like /etc/httpd/conf.d and /var/www/html are mounted over the network, via GlusterFS. This is on RHEL 6 servers, with Puppet 3.8 and Gluster 3.5.)
Puppet has no problems with files that are local to a given server, but when I try to create or update files on this shared filesystem, it almost never works. Puppet sees that a change needs to be made, but then the file fails the subsequent checksum check. Here's an example of Puppet trying (and failing) to create a file:
change from absent to file failed: File written to disk did not match checksum; discarding changes ({md5}990680e579211b74e3a8b58a3f4d9814 vs {md5}d41d8cd98f00b204e9800998ecf8427e)
Here's a similar example of a file edit:
change from {md5}216751de84e40fc247cb02da3944b415 to {md5}261e86c60ce62a99e4b1b91611c1af0e failed: File written to disk did not match checksum; discarding changes ({md5}261e86c60ce62a99e4b1b91611c1af0e vs {md5}d41d8cd98f00b204e9800998ecf8427e)
This doesn't always happen, but on my Gluster filesystems, I'd say it happens at least 90% of the time.
The latter checksum (d41d8...) is the checksum of an empty file. So I think this is what's happening: Puppet sees that the change needs to be made, and makes the change. But it checksums the file again before the write is committed, so it doesn't see that the change was successfully made, and so it rolls back.
Two questions, then. First: Does this seem plausible, and how do I test/confirm that this is the case? Second: Assuming this is what's happening, how do I prevent it? The first thing that comes to mind would be simply sleeping for a few hundred milliseconds after file change operations, but I don't immediately know if that's even possible, much less wise.