I'm trying to write a raycasting shader in GLSL, and it's being unbearably slow. So I installed AMD's "GPU Shader Analyzer", so I can look at what is actually generated. I've got it from 2 FPS up to 12, but that's still not fantastic.
I feel like I could improve it, but I'm stuck at three points.
Weird Underscores: I get what
ADD R1.x, R0.x, -C6.x
does; subtracts C6.x from R0.x, and stores it in R1.x. Similarly withADD R4.x, R1.x, R2.w, R4.x
; Multiply R1.x and R2.w, add on R4.x, and store in R4.x. But sometimes I get calls likeMUL __, PV16.x, C1.x
, and I can't figure out what the underscores mean.Trailing "E"s: Usually my multiplications are turned into
MUL a, b, c
. But sometimes I seeMUL_e a, b, c
. This also happens withSQRT_e
,RSQ_e
andRCP_e
.Magic: I just plain don't get these instructions.
LOOP_DX10 i0 FAIL_JUMP_ADDR(10) VALID_PIX
Begin loop. But what are the parameters?ALU_BREAK: ADDR(48) CNT(3)
No idea.SETGT_INT R0.y, 350, R3.y
My for loop has i < 350, but what're the others?PREDNE_INT __, R0.y, 0.0f
Maybe set i to 0? But why floating-point 0?ALU_PUSH_BEFORE: ADDR(51) CNT(34)
Push makes me think of the stack?PREDGT __, R0.x, R3.x
No clue.JUMP POP_CNT(1) ADDR(8) VALID_PIX
Unconditional jump, but what's POP_CNT?ALU: ADDR(85) CNT(1)
Whoosh.BREAK ADDR(9)
Jump to 9?POP (1) ADDR(8)
Removes the frame from the stack? Why 8?ENDLOOP i0 PASS_JUMP_ADDR(2)
Ends the loop starting with LOOP_DX10.CNDE_INT R0.x, R2.z, 0.0f, 1065353216
x = q ? a : b, but I don't know which variable is which.
Could someone please explain these? I can't find any documentation for the first two, and I don't understand the docs for the last. I've never done any assembly before, unfortunately.