I am working on a lossy compressor, and I am wondering which way is more suitable for the design, the first one is to transfer data to the global memory until all the data is processed and the second way is to use either pipes or channels to pass data.
1 Answers
Since I have worked on lossy compression algorithms, I can say apparently both can be useful. However, it depends on a lot of factors.
For instance, it depends on the size of your workload, it depends on the underlying FPGA, it depends on your particular lossy compression algorithm and etc.
With OpenCL pipes and channels, you can leverage internal bandwidth of the your Intel FPGA and avoid the bottleneck of using off-chip memory. And apparently you reduce your storage requirements when data is consumed as produced. But all these advantages come at the cost of additional handshaking logic between your OpenCL kernels; the channel pipeline will be more efficient but less scalable in terms of place and route timing closure if its stages are connected by wires or registers pulsed by the same clock.
Below Updated Answer:
Since you mentioned following in your comment:
The data to be processed would be at least 500 MB so huge space:
Yes, that's a big chunk. Although Intel Aria 10 does not have a High-Bandwidth Memory (HBM), if it's possible try a FPGA with High Bandwidth Memory (HBM) banks on it's fabric. HBM in FPGA devices helps overcoming the bandwidth (memory) bottleneck.
Additionally, since in many applications one of the main goals is not to lose any data, therefore you consider leveraging lossless compression algorithms. Please check this survey paper where they give examples of compression algorithms on FPGAs that will help you understand the design goals in your case.
Also please check Gzip Compression OpenCL Design Example in OpenCL that will guide you. Gzip is a widely used compression and decompression method. This design example presents a compression implementation using Intel FPGA SDK for OpenCL.
And if you want to know how OpenCL pipes and channels will help you, then of course Constructing Concurrent Data Structures on FPGA with Channels can provide you some directions.

- 1,499
- 2
- 10
- 25
-
Thank you for your response lucky to hear from experienced engineers, I agree with what you said, the data to be processed would be at least 500 MB so huge space is needed and also the algorithm nature is sequential, I will be using arria 10 – Hazim Hamad Mar 10 '22 at 20:04
-
1OK. Considering it is an interesting question, I will update my answer with further details so that it's helpful for you and present and future readers. – BZKN Mar 11 '22 at 12:00
-
Ok waiting for your update,if there are any books or resources for designs available,it would be helpful – Hazim Hamad Mar 15 '22 at 11:40
-
Yes, I will update my answer today. – BZKN Mar 15 '22 at 11:48