0

I am implementing a video processing project in real time which comes from an HDMI input. The video input is going to have a green background, which will be replaced by an image stored in the FPGA in order to generate a new video with a different background. I am using PYNQ-Z2 board.

So far, I have tried the following:

  1. Storing the whole image in BRAM is not possible because there is not enough space

  2. Using a second stream for the image and then try to mix the 2 streams (video + image). Cannot synchronize the 2 streams.

  3. Store the image in RAM and use a double buffering scheme to load part of the image in BRAM. The first buffer is used for the processing 1 row of the image. The second one is used for loading the next row from DDR memory via the DMA (DMA is controlled by the CPU). When a row is done, then an interrupt is sent from the FPGA to the CPU so that the next line can be sent from DDR memory. Also, I switch the buffers so that new data starts loading. This solution has too much latency in the DMA transfer and the image in the video output is broken.

Florence
  • 59
  • 5
  • 1
    The https://electronics.stackexchange.com/ may be better suited for this question. – Morten Zilmer Jan 23 '22 at 10:34
  • If you have the new background stored entirely in the FPGA, then why can't you just replace the green background of the HDMI image on the fly in the FPGA? For inspiration, then note that Intel FPGA has a Video and Image Processing Suite IP collection for operations like this, see https://www.intel.com/content/www/us/en/products/details/fpga/intellectual-property/dsp/video-image-processing-suite.html – Morten Zilmer Jan 23 '22 at 10:36
  • @MortenZilmer this is what I am trying to do, but the FPGA cannot fit the entire image – Florence Jan 23 '22 at 11:46
  • If both images are being streamed in by for example HDMI, you will need some memory to synchronize the images, and in this case some external DDR memory is usually required, since that is the only memory that will be large enough. – Morten Zilmer Jan 23 '22 at 15:23
  • /An external DDR memory should have more than enough bandwidth for you to do it, you don't need a large and expensive FPGA part with tons of block ram resources. Your mention of an interrupt is very suspicious - what exactly you're doing? Is processing done n some soft CPU core? The latter can be your bottleneck, not the RAM access. – SK-logic Jan 25 '22 at 09:01
  • @MortenZilmer I am not able to change the board but there is already a DDR memory chip there. This one is not connected to the FPGA, but to the CPU which is also the bottleneck. – Florence Jan 25 '22 at 15:07
  • @SK-logic Indeed I use a CPU which transfers data via DMA from DDR Memory to BRAM on the FPGA. I updated my original post to clarify the use of CPU and DDR. – Florence Jan 25 '22 at 15:13
  • @Florence: If there are 2 independent streaming video channels, you will need to store at least one of those (and maybe both, depending on the output timing) in large memory (typically external DDR) order to synchronize the frames. If one of the video channels is a fixed image that is stored in internal memory or generated on-the-fly, and output can follow the other video channel, they you should be able to do without large memory (DDR). – Morten Zilmer Jan 25 '22 at 17:38
  • @Florence, is it an AXI DMA IP block? It's hardly optimal. You'll have better latency if you implement data fetch from DDR on your own, using a 128 bit AXI4 master and bursts as long as possible. And of course do not access the BRAMs over AXI, it'd be a massive overkill. – SK-logic Jan 25 '22 at 18:12
  • @MortenZilmer ok, so I have a static image and an input stream. How can I get the static image data sync from DDR to the IP to mix it into the input stream? – Florence Jan 26 '22 at 14:38
  • @SK-logic How can I interface the AXI master port and bypass the CPU? Since the DDR is strictly connected to the CPU. – Florence Jan 26 '22 at 14:39
  • @Florence, DDR in Zynq is connected to the PS interconnect, not CPUs. You can have an AXI master in PL accessing DDR address range directly. – SK-logic Jan 26 '22 at 16:51
  • @Florence: If access bandwidth to the static image is fast enough to keep up with the input stream, then you can simply mix the two on the fly, without having to store the input stream in memory. It may be required with a small FIFO to remove access latency in access to the static image. – Morten Zilmer Jan 26 '22 at 17:58

0 Answers0