Currently my application has a major bottleneck when it comes to GPU CPU data sharing.
Basically I am selecting multiple items, each item becomes a buffer and then becomes a 2D texture (of the same size) and they all get blended together on the GPU. After which I need to know various things about the blend result. Which is on the GPU as a (single channel float) texture:
- Maximum & Minimum value in the texture
- Average value
- Sum Value
Effectively I ended up with the very slow round about of
- Put data on the GPU * N
- Read data from GPU
- Cycle data on CPU looking for values
Obviously a CPU profile shows the 2 major hot spots as the writes and the read. the textures are in the 100x100s not 1000x1000s but there are a lot of them.
There are 3 things I am currently considering
- Combine all the data & find out interesting data before putting on GPU (seems pointless putting it on the GPU at all & some of the blends are complex)
- When loading the data put it all onto the GPU (as texture levels, therefore skipping the lag on item selection in favor of a slower load)
- Calculate the "interesting data" on the GPU and just have the CPU read back those values
On my machine and the data I have worked with, throwing all the data on the GPU would barely use the GPU memory. Highest I have seen so far is 9000 entries of 170 X 90, as its single channel float, by my maths that comes out as 1/2 GB. Which isn't a problem on my machine, but I could see it being a problem on the average laptop. Can I get a GPU to page from HDD? Is this even worth pursuing?
Sorry for asking such a broad question but I am looking for the most fruitful avenue to pursue and each avenue would be new ground to me. Profiling seems to highlight readback as the biggest problem at the moment. Could I improve this by changing FBO/Texture settings?
At the moment I am working in SharpGL and preferably need to stick to OpenGL 3.3. If however there is a route for rapid improvement in performance for any particular technique that is out of reach via either video memory or GL version I might be able to make a case to up the software system requirements.