Optimised GPU convolution for low memory integrated devices -such as arm processors /GPUs?

Asked Sep 13 '19 at 04:25

Active Sep 13 '19 at 04:25

Viewed 71 times

I wish to implement convolution on arm mali GPUs and want it to be optimised for both speed and memory ? What's the best way to do this? GEMM based MCMK convolutions are not suited as they utilise a lot of memory. Also, a direct implementation on GPU is way slower than the corresponding CPU version. Any time for memory reshaping should be taken into account for timing calculations.

asked Sep 13 '19 at 04:25

Did you try fourier transform based convolution? It is many times faster than naive convolution for filter width of 20-30 or more, especially best when convolution filter has same size with the image. – huseyin tugrul buyukisik Sep 13 '19 at 14:09
Well. My primary concern is regarding computer vision applications so at max the filter width will be 7 while the common kernel width will be 3 or 5 ! – Sep 14 '19 at 14:41
See https://github.com/ARM-software/ComputeLibrary for some pre-optimized implementations. – solidpixel Sep 27 '19 at 09:08

Optimised GPU convolution for low memory integrated devices -such as arm processors /GPUs?

0 Answers0