2

I inserted multiple same boards into the system. Device PCIe is implemented with Xilinx IP core. After each FPGA program is programed, manually refresh the device manager to check whether the device and driver are working properly.

My confusion is that it seems that this approach can only work on two boards at the same time. After the third board is programmed , and then refresh the task manager, the system prompts insufficient resources, "This device cannot find enough free resources that it can use (Code 12)"

I tried to disable the other two boards, but the device still prompted conflicts. I don't know how to query conflicting resources.

My boards have 2 BARs(BAR0: 2KB, BAR1:16MB), and 1 IRQ. enter image description here

I did a few experiments and felt that it was caused by the conflict of memory resources. The conflicting party was the AMD integrated graphics card that came with the motherboard. 1st, 2nd, 3rd all indicate my board number. enter image description here

3rd is not recognized because of conflicts.

After I shut it down, I plugged in all three boards and then turned it on again. As a result, during the startup process, the system restarted again after a sudden power failure, and then all three boards were normal. At this time, it is found that the memory address of the graphics card has changed

enter image description here

I want to know how to resolve the conflict? modify my driver code or FPGA configuration?

lizzoe
  • 51
  • 3
  • Is there a question here? I have had a little experience with this kind of thing, but would like to know what you need. – Rich Maes Oct 29 '20 at 19:50
  • I want to know how to resolve the conflict. I think the operating system will resolve addressing conflicts on its own, but it does not look like this now. So I want to know how to solve it. Otherwise, there is no way to use multiple boards in a system at the same time. – lizzoe Oct 30 '20 at 02:13

1 Answers1

2

PCIe enumeration should resolve memory allocations issues, however there are a couple implementation issues to be aware of. Case in point, I have used Xilinx XDMA's with a 64 bit BAR of 2GB in size and I have literally bricked a DELL XPS motherboard. But I have done the same with an IBM system and it just worked. The point here is that enumeration can be done with Firmware, Hardware or OS driven event. If you doing hardware manager, that sounds like OS driven, but when I toasted the XPS board, that was some kind of FW issue that was related to BAR size that resulted in a permanent failure. 16MB isn't big and it should not be a problem, but I would recommend going with the Xilinx defaults first and show reliability there. I think it's one BAR at 1M. I have run 3x 64 bit BARS at 1MB a piece without issue, but keep it simple and show reliability. Then move up. This will help isolate if it is a system flakiness or not.

I have seen some system use FW based enumeration that comes up really fast, before the FPGA has configured, in which case there is no PCIe target to ID. If you frequently find that your FPGA is not detected on power up, but detected on a rescan, this can be a symptom. How to resolve this is a bit of a pain. We ended up using partial reconfiguration. Start with the PCIe interface, then have a reconfig to load the remaining image. Let's hope it isn't this problem

The next thing is to be aware of is your reset mechanism within the FPGA. You probably hooked the PCIe IP reset to the bus reset which is great, however, I have in the past also hooked that reset to internal PLL locked signals that may not be up. For troubleshooting purposes, keep that reset simple and get rid of everything else to show that just the PCIe IP by itself is reliable first.

You also have to be careful here too. If you strip things down, make sure it is clean. If you ignore the PLL lock and try to use a Xilinx driver, such as the XDMA driver, it has a routine where it tries to identify the XDMA with data transactions. It is looking for the DMA BAR one BAR at a time. But when it does this, the transaction it attempts may go out on the AXI bus if the BAR isn't the XDMA control BAR. If the AXI bus isn't out of reset or clocked when this happens you will lock the AXI bus and I have locked up a Linux box this way on several occasions. AXI requires that transactions complete, otherwise it just sits there waiting.

BTW, on a Linux box, you can look at the enumeration output in the kernel log. I'm not sure if Windows shows you the same thing or not. But this can be helpful if you see that the device was initially probed but then something invalid was detected in the config register, versus not being seen at all.

So a couple things to look at.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Rich Maes
  • 1,204
  • 1
  • 12
  • 29