I am confused about what's faster versus what's slower when it comes to coding algorithms that execute in the pipeline.
I made a program with a GS that seemingly bottlenecked from fillrate, because timer queries showed it to execute much faster with no rasterisation enabled.
So then I made a different multi-pass algorithm using transform feedback, still using a GS every time but theoretically does much less work overall by executing in stages, and it significantly reduces the fill rate because it renders much less triangles, but in my early tests of it, it appears to run slower.
My original thought was that the bottleneck of fillrate was traded for the bottleneck of calling multiple draw calls. But how expensive is another draw call really? How much overhead is involved in the cpu and gpu?
Then I read the answer of a different stack question regarding the GS:
No one has ever accused Geometry Shaders of being fast. Especially when increasing the size of geometry.
Your GS is taking a line and not only doing a 30x amplification of vertex data, but also doing lighting computations on each of those new vertices. That's not going to be terribly fast, in large part due to a lack of parallelism. Each GS invocation has to do 60 lighting computations, rather than having 60 separate vertex shader invocations doing 60 lighting computations in parallel.
You're basically creating a giant bottleneck in your geometry shader.
It would probably be faster to put the lighting stuff in the fragment shader (yes, really).
and it makes me wonder how it's possible for a geometry shaders to be slower if their use provides an overall less work output. I know things execute in parallel, but my understanding is that there is only a relatively small group of shader cores, so starting an amount of threads much larger than that group will result in the bottleneck being something proportional to program complexity (instruction size) times the number of threads (using thread here to refer to invocation of a shader). If you can have some instruction execute once per vertex on the geometry shader instead of once per fragment, why would it ever be slower?
Help me gain a better understanding so I don't waste time designing algorithms that are inefficient.