Imagine there're 2 vectors in GPU memory - a
and b
, which are in fact 2D float
textures (1 float value per pixel). The goal is to compute the dot product a·b
.
If I create a third texture - c
- which contains the element-wise product of a
and b
(i.e. c_{ij} = a_{ij} × b_{ij}
), then the sum of all pixels' values is the dot product.
I thought I could let the GPU generate 1px mipmap of c
- let's call it d
. AFAIK d
is the average of all pixels in c
, which if multiplied by appropriate LOD of the mipmamp would result in the dot product.
Questions
- Is it possible to compute the dot product in the way described above?
- Would such approach be faster than computing the dot product via compute kernel?
For concrete example of API let's consider Apple's Metal API or OpenGL/CUDA.