5

I'm working on a reconfiguration controller for a reconfigurable CPU. One of the features I tried to implement is to handle CRC errors properly, and also to allow for aborts during reconfiguration. I am using a Virtex7 board and as described in ug702.pdf (page 98) reloading a bitstream after a CRC error isn't a problem, also an ABORT can be performed as shown in ug470_7Series_Config.pdf (page 48).

At first glance it seems to work as described in the documentation, that is on a CRC error my reconfiguration controller notifies the CPU and the CPU gives my controller a fresh bitstream. Also, the CPU can send my controller an abort command and the controller would abort it as described in the docs.

Hovewer, it seems to work only sporadically, sometimes the whole system freezes, sometimes I get nonsensical exceptions, and sometimes unconditional jumps are not taken it seems.

I am not sure whether I messed up somewhere or this was to be expected, since the containers where the partial bitstreams go to are interconnected with the pipeline and the bus. I remember to have read in some xilinx pdf that the bitstream is not finally configured until the desynch command at the end of the bitstream was encountered. Does that mean the fabric is not affected until the full partial bitstream is loaded onto the fabric, without any errors, and therefore couldn't affect the rest of the design. Or is a partially loaded partial bitstream actually configured onto the fpga and can trigger all sorts of weird signals on its output ?

rtur
  • 165
  • 9
  • 1
    Xilinx FPGAs have a "double buffering". The current active configuration is not effected while you shift-in the new configuration, otherwise it wouldn't be possible to read a config back or to shift in a new config while the FPGA is still running. So the new configuration is activated with one of the last commands in the bit-stream. The bit-stream itself is a control flow for the internal (re-)configuration FSM. – Paebbels Jul 12 '16 at 11:16
  • Makes sense, I was hoping/suspecting it might work like that, I guess I messed up somewhere. If you make your comment an answer, preferably with a source :), I would gladly accept it. – rtur Jul 12 '16 at 11:44
  • 1
    @Paebells, please cite a source. A 7 Series FPGA has a configuration of between 17 Mb and 450 Mb. [UG909](http://www.xilinx.com/support/documentation/sw_manuals/xilinx2014_4/ug909-vivado-partial-reconfiguration.pdf) Partial BIT File Integrity Page 85 - 87. ..."The FPGA is by definition already in user mode when the partial BIT file is loaded. Because the configuration circuitry supports error detection only after a BIT file has been loaded, a corrupt partial BIT file can become active, potentially damaging the FPGA if left operating for an extended period." –  Jul 13 '16 at 17:52
  • 1
    "The configuration engines of 7 series ... have the ability to perform a frame-by-frame CRC check and will not load a frame into the configuration memory if that CRC check fails. A failure is reported on the INIT_B pin (it is pulled Low) and gives you the opportunity to take the next steps: retry the partial bit file, ... The partially loaded reconfiguration region will not have valid programming in it, but the CRC check ensures the remainder of the device (static region and any other reconfigurable modules) stays operational while the system recovers from the error." –  Jul 13 '16 at 19:20
  • So the soft IP PRC/EPRC of Virtex 5/6 have been hard coded for partial reconfiguration in the 7 series. The model would be the same, a spooling memory (FIFO) holding a configuration frame (the granularity of partial reconfiguration). It would seem from the symptoms you're relying on something that isn't loaded with the old (or inactive) configuration frame value causing execution errors. Functionality crossing configuration frame boundaries? Without specifics this gets kind of hand wavy. –  Jul 13 '16 at 19:33
  • @user1155120 Thank you for your input and the source. Looks like your comments answers the question more thoroughly than Paebbels comment, care to make it into an answer I can accept ? There is one little thing I'm not sure about, does this: "Because the configuration circuitry supports error detection only after a BIT file has been loaded [...]" mean that the CRC check isn't performed for each configuration frame upon reception ? I.e. the CRC Error signal is raised only after the complete BIT file (or the corrupted frame) has been configured onto the fabric ? – rtur Jul 19 '16 at 11:32

0 Answers0