+1 cycle compared to what? Do you have other option working that is only one instruction and no additional overhead?
No shader assembly language i've seen can perform per-component negation. What you described is two swizzled mov
s (one of which is with negation prefix), like
MOV result.position.xz, -vertex.position.yyww;
MOV result.position.yw, vertex.position.xxzz;
You can reduce it to one instruction by using something like vec4 n = vec4(dir.yxwz) * vec4(-1.0, 1.0, -1.0, 1.0)
. That would be like:
PARAM c[1] = { { -1, 1 } };
MUL result.position, vertex.position.yxwz, c[0].xyxy;
(in both cases i've used result.position
and vertex.position
from ARB vertex program just as example).
But, it uses extra constant register, so it is not necessarily better.
Of course, both versions could be wrapped into macro.
update
I now see what you wanted to do. Something that generates code like (for latest AMD):
0 x: DOT4 R0.x, -R0.y, R1.x
y: DOT4 ____, R0.x, R1.y
z: DOT4 ____, -R0.w, R1.z
w: DOT4 ____, R0.z, R1.w
Instead you see extra MOVs. However, it isn't looks like valid code (I don't think DOT can take partially-negated arguments. Can't say more without carefully reading instructions manual), so compiler adds extra MOV (which is not necessarily results in extra cycle, by the way - depends on other instructions nearby).