Well, question is not new but I still unable to find any nice solution. I distributing binaries 100-300mb files via puppet fileserver, but it works really bad in case of performance I'm sure because of md5 checks. Now I have more than 100 servers and my puppet master works really hard to manage all that md5 computation checks. In puppet 3.x checksum for file{} does not work. I'm unable to update to puppet 4.x and I have no chance to change flow. files should came from puppet fileserver. So I can't believe that there is no custom file type with fixed checksum option, but I can't find it :( Or maybe there is any other way to download files from puppet fileserver ? Please any advice will help! rsync or pack as a native package impossible option to me.
1 Answers
It is indeed reasonable to suppose that using the default checksum algorithm (MD5) when managing large files will have a substantial performance impact. The File
resource has a checksum
attribute that is supposed to be usable to specify an alternative checksumming algorithm among those supported by Puppet (some of which are not actually checksums per se), but it was buggy in many versions of Puppet 3. At this time, it does not appear that the fix implemented in Puppet 4 has been backported to the Puppet 3 series.
If you need only to distribute files, and don't care about afterward updating them or maintaining their consistency via Puppet, then you could consider turning off checksumming altogether. That might look something like this:
file { '/path/to/bigfile.bin':
ensure => 'file',
source => 'puppet:///modules/mymodule/bigfile.bin',
owner => 'root',
group => 'root',
mode => '0644',
checksum => 'none',
replace => false
}
If you do want to manage existing files, however, then Puppet needs a way to determine whether a file already present on the node is up to date. That's the one of the two main purposes of checksumming. If you insist on distributing the file via the Puppet file server, and you are stuck on Puppet 3, then I'm afraid you are out of luck as far as lightening the load. Puppet's file server is tightly integrated with the File resource type, and not intended to serve general purposes. To the best of my knowledge, there is no third-party resource type that leverages it. In any case, the file server itself is a major contributor to the problem of File
's checksum
parameter not working -- buggy versions do not perform any type of checksumming other than MD5.
As an alternative, you might consider packaging your large file in your system's native packaging format, dropping it in your internal package repository, and managing the package (via a Package
resource) instead of managing the file directly. That does get away from distributing it via the file server, but that's pretty much the point.

- 160,171
- 8
- 81
- 157
-
Thanks for the reply! Yeah... I was following issue PUP-1208 for a while :), but didn't know that it was backported to puppet 3.x I've tried 3.8.5 and it didn't work same as 3.5.x I mean md5lite or sha even ctime or mtime. In my case I do care about file uptodate state so to me mtime should be totally enough but it does not work :( Btw I stuck on 3.x puppet due to theForeman project we are using it a lot and it still not support puppet 4. Maybe should try puppetDB... – Roman Iuvshin Feb 05 '16 at 17:08
-
@RomanIuvshin, I wrote that the fix to PUP-1208 has *not* been backported, so I'm not surprised to hear that v3.8.5 exhibited the same misbehavior as v3.5. As for puppetDB, I'm not quite seeing what you hope that will do for you in this situation. If you need to stay on Puppet 3 but get away from MD5 then you probably need to get away from the Puppet file server. My suggestion of using packages instead is one fairly straightforward alternative, but there are others. – John Bollinger Feb 05 '16 at 17:19
-
Oh I see now. Thanks for the great clarification! – Roman Iuvshin Feb 05 '16 at 17:32