0

I have a working PXE server using Puppet Razor (now end of lifed, but we still need it to work a little longer). It has built hundreds of systems for us.

I can go to most systems here and manually tftp files from that server and get files whose MD5SUMs match perfectly.

We have some systems in a remote location, though, which aren't able to TFTP any files properly. They get their DHCP address, but fail to download the vmlinuz file needed to continue. If I go to a system there that is up and running, and try to manually tftp a file, I get a file whose MD5SUM is incorrect. If I then repeat the task, I always get exactly the same, incorrect, MD5SUM. If I instead rsync the file from the tftp server, I get exactly the file as expected, with the correct MD5SUM.

The tftp transfers are painfully slow, often taking 30-60 seconds for a file that rsync transfers in under a second. So network bandwidth isn't the issue. Something else is going on.

Where should I start looking to debug this? It's darned weird.

wortmanb
  • 183
  • 3
  • 9
  • 1
    When dealing with transfer issues, tcpdump is always a good tool to use, try to grab the data, maybe from both ends of the link and see what goes wrong. One thing with TFTP is that it don't play well with NAT at all. and one should even consider TFTP to be one of the protocols that will not work in anything other then localnet. (routing often kills it) – NiKiZe Sep 23 '21 at 15:48
  • Good to know. I'll ask my network guys if there's any NAT involved between here and there. Thanks! – wortmanb Sep 23 '21 at 15:54
  • I bet you're seeing an issue with IP fragmentation. Some (if not most) PXE clients don't bother reassembling fragments they receive. So check your Wireshark or tcpdump trace on the client side for that and set `blkzise` on server side so that fragmentation won't happen over WAN tunnel. – Peter Zhabin Sep 24 '21 at 16:32

0 Answers0