0

I am using OpenGL (via OpenTK) to perform spatial queries on lots of point cloud data on the GPU. Each frame of data is around 200k points. This works well flor low numbers of queries (<10queries @ 60fps) but does not scale as more are performed per data frame (100queries @ 6fps).

I would have expected modern GPUs to be able to chew through 20 million points (200k * 100 queries) points from 100 draw calls without breaking a sweat; especially since each glDrawArrays uses the same VBO.

A 'spatial query' consists of setting some uniforms and a glDrawArrays call. The geom shader then chooses to emit or not emit a vertex based on the result of the query. I have tried with / without branching and it makes no difference. The VBO used is separated attributes, one is STATIC_DRAW and other is DYNAMIC_DRAW (updated before each batch frame of spatial queries). Transform feedback then collects the data.

Profiling shows that glGetQueryObject is by far the slowest call (probably blocking, 5600 inclusive samples compared to 127 from glDrawArrys) but I'm not sure how to improve this. I tried making lots of little result buffers in GPU memory and binding a new transform feedback buffer for each query, but this had no effect - perhaps due to running on a single core? The other option would be to read the video memory from the previous query from another thread, but this throws an Access Violation and I'm unsure if the gains would be that significant.

Any thoughts on how to improve performance? Am I missing something obvious like a debug mode that needs switching off?

genpfault
  • 51,148
  • 11
  • 85
  • 139
John
  • 405
  • 4
  • 19
  • Do you need the results of each query before you can submit the next one? Otherwise, you could delay the call to `glGetQueryObject()`, and possibly also check if the result is available before making the blocking call to get the result. For example, if `k` is an index for your query, you could get the result of query `k - d` after you submit query `k`, where `d` is an offset you can tweak. – Reto Koradi Jun 13 '15 at 16:22
  • Thanks for your reply! Yea, they can be done in parallel. How would I set that up? And would it be effective on a dual core machine? I tried to read out the query result in a separate thread (separate query + transform feedback buffer) but I got an access violation exception - I assumed it was stopping me from creating OpenGL resources in a different thread? – John Jun 13 '15 at 19:08
  • I wasn't really thinking about muti-threading. What I had in mind was to simply defer the call to `glGetQueryObject()` in your program logic, so that retrieving the query results is always a few steps behind submitting the queries. – Reto Koradi Jun 14 '15 at 18:35
  • Ah I see. So will OpenGL prepare queries in the background whilst executing other operations like `glDrawArrays`? Had a quick look but can't spot the relevant documentation. – John Jun 14 '15 at 19:15

0 Answers0