0

I'm trying to boot PCs from a Windows 2012R2 WDS server in UEFI mode. If, and only if, the client is in a different subnet from the DHCP/PXE servers, this fails with some of them. (It always works in BIOS mode, but I need UEFI.)

The symptom is that after the initial DHCP request/offer/request/ack sequence, the working clients contact the PXE server to get their boot information, and the failing ones do not.

There are two DHCP servers (also 2012R2) in addition to the PXE. There are no boot-related DHCP options configured on them; DHCP relaying is enabled on the network and relays to all three servers.

This is the packet list from booting a working client:

1  DHCP Discover - Transaction ID 0xe828c4bc
2  DHCP Offer    - Transaction ID 0xe828c4bc (from first DHCP)
3  DHCP Offer    - Transaction ID 0xe828c4bc (from PXE)
4  DHCP Offer    - Transaction ID 0xe828c4bc (from second DHCP)
5  DHCP Request  - Transaction ID 0xe828c4bc (to first DHCP)
6  DHCP ACK      - Transaction ID 0xe828c4bc (from first DHCP)

7  4011 → 4011 Len=347                       (to PXE)
8  4011 → 4011 Len=349                       (from PXE)
9  TFTP Read Request, File: boot\x64\wdsmgfw.efi, (to PXE)
...

With a failing client, it looks exactly the same until line 6, then nothing more happens; it simply does not contact the PXE server.

I have compared the packet contents in Wireshark, and other than the values which are dependent on what network the client is on (giaddr, router, etc.), all the offers are identical between the working and failing cases.

This seems to affect particular BIOS/firmwares: The working clients include VMware Workstation and ESXi, as well as an Intel NUC, and it fails with Asus B150M-C mainboards and at least one Dell Optiplex. BIOSes are current, and at most a few months old, on all devices involved.

It looks to me like the UEFI firmware does not know how to use a router. Is there a way get this working?

Christian
  • 110
  • 1
  • 4

2 Answers2

1

The problem is the client; I should have looked more closely at my packet traces. I just figured out that right after it gets the DHCP ACK from the regular DHCP server, the failing client starts ARPing for the PXE server, getting nowhere, of course.

So the problem really is that the firmware does not understand routers.

Christian
  • 110
  • 1
  • 4
  • Please explain why you think the UEFI firmware does not understand routers. I am curious as to how you came to that conclusion. – fpmurphy Apr 17 '17 at 00:36
  • To send an IP packet to a destination, you need its MAC address (at least with Ethernet). The way to determine that address is ARP; you send a request "whoever is 192.168.1.1, tell me your MAC address". ARP is not routable, so it works only in the connected network segment. If the destination for your packet is outside that segment, you need a router, and you send the packet with the destination IP address to the router's MAC. My mainboards here send ARP requests for the PXE server itself, even though the DHCP lease tells them it's outside their subnet. – Christian Apr 17 '17 at 03:44
  • Thanks. Issue is probably specific to your mainboard firmware as you say. However, Windows 2012R2 WDS is known to be temperamental w.r.t. EFI/UEFI booting. I suggest you read 2Pint Software's White Paper "Using DHCP to Control UEFI & BIOS PXE". – fpmurphy Apr 18 '17 at 01:59
  • 1
    Also, Asus has acknowledged the bug and is about to release a fix. Thanks for the suggestion, I will take a look at that. – Christian Apr 18 '17 at 03:17
-1

your DHCP strategy is a mess. You should not have more than one DHCP server per net and the PXE server acting as proxyDHCP.

More than one DHCP server leads to race conditions; you cannot predict which DHCP will end up really providing the IP to the client.

you probably have some other DHCP in the second net and the proxyDHCP offer (PXE server) probably never gets to the other side.

You should carefully read the Wireshark capture; uploaded it somewhere and we can help.

Pat
  • 3,519
  • 2
  • 17
  • 17
  • I have one subnet with two DHCP servers and the PXE server. The clients are in another subnet. The PXE server, as I was careful to point out above, hoping to avoid this exact response, _is_ acting as proxy DHCP, because that is what port 4011 is. Having two DHCP servers does not lead to "race conditions", a term whose definition I would advise you to look up, but is useful to avoid service interruptions (simply put, with only one you'd better never need to reboot it) as long as they are giving compatible answers, which mine do. The PXE offer would get through, were it ever requested. – Christian Apr 02 '17 at 12:53
  • 1) You are wrong; two DHCP offers leads to race conditions; the DHCP protocol has not provision for unequivocally taking 1 of the 2 offers; please read the corresponding RFCs. 2) Your redundancy DHCP strategy leads to having you know asking here whay your set up does not work. Upload your Wireshark capture. – Pat Apr 02 '17 at 13:13
  • 1) I do not need clients to take one specific offer. I need them to get at least one valid offer and take any one; I couldn't care less which one it is. 2) If you read my question past the first sentence, you will find that I mentioned that this _works with some clients_. Traces: https://www.dropbox.com/sh/mmfjrbobh45ausm/AAD_8mg92-JR7Ck7nqTo7cnNa?dl=0 (filtered: "(bootp||udp.port==4011||tftp.opcode==1)"). "Working" is cut after the first TFTP request to save space; it booted correctly after that. Traces are from the switch port the client is directly connected to. – Christian Apr 02 '17 at 13:23
  • 1) You said that there are no race conditions and I proved you wrong. so far we do knot know if your race conditions is not leading to alternative taken non absolutely equal offers. 2) It works with some clients because not all the DHCP client implementations are equal. If you want me to read your wireshark please remove your downvote. – Pat Apr 02 '17 at 13:27
  • Huh? I did downvote your (quite useless) answer, but I was then told that below 120 rep or something my vote wouldn't count. Are you sure that wasn't someone else? Anyway, I figured it out. After the client gets its DHCP ACK, it starts ARPing for the PXE server, so it clearly does not understand routers. I'm working on a workaround now; if that doesn't help, I'll see if I can do something with proxy ARP. No need to read the traces; I filtered out the ARP traffic anyway. Sorry for that. – Christian Apr 02 '17 at 13:34
  • Please.... your DHCP servers are assigning the same IP range to the local "and" remote network then sure you are going to have "routing" problems... – Pat Apr 02 '17 at 13:42
  • What on Earth are you talking about? Please go away. – Christian Apr 02 '17 at 13:43