Purpose of ARM parallel instructions ASX and SAX?

Question

Can someone explain when it's beneficial to use the parallel add/subtract ARM instructions ASX and/or SAX? In what situation/algorithm would one need to exhange the halfwords, then add AND subtract the upper/lower halfwords? Below is the explanation of each:

ASX

Exchange halfwords of Rm, then Add top halfwords and Subtract bottom halfwords.

SAX

Exchange halfwords of Rm, then Subtract top halfwords and Add bottom halfwords.

Note that this is one of the old ARMv6 SIMD instructions, not NEON (officially "Advanced SIMD", precisely to disambiguate it from the old stuff). I'd assume it's related to some 2x2 matrix operation, but I can't think offhand quite what... — Notlikethat, Jul 21 '16 at 17:52
Looks like an FFT butterfly, to me, but I couldn't say for sure. — sh1, Jul 21 '16 at 18:17
Audio processing regularly has L+R and L-R in the two channels, and I could see this instruction being helpful for that. Not exactly sure what the exchange part would be useful for, though a common issue is having the two channels reversed. Maybe there's another instruction that doesn't do the exchange, but still does the add/sub? — Russ Schultz, Jul 21 '16 at 18:32

Ross Ridge · Accepted Answer · 2016-07-21T21:32:16.217

11

Like Russ Schultz said in a comment it would be useful audio encoded with L+R and L-R channels. It would be used something like:

ldr r1, [r0]      ; R1 = (L-R):(L+R)
shasx r1, r1, r1  ; R1 = ((L-R)+(L+R))/2:((L+R)-(L-R))/2 
                  ;    = (2L/2):(2R/2)
                  ;    = L:R  
ror r1, r1, #16   ; R1 = R:L
str r1, [r0]

The swapping of the third operand's top and bottom halfwords is necessary so the two different components can be added/subtracted with each other. Without the exchange you'd get ((L-R)+(L-R))/2:((L+R)-(L+R))/2 = (2(L-R)/2):(0/2) = (L-R):0.

edited Jul 21 '16 at 21:32

answered Jul 21 '16 at 20:33

Ross Ridge

38,414
7
81
112

You get an upvote vote exemplary assembly formatting! But more on topic, wouldn't it be more useful if the instruction didn't do the exchange so you didn't have to do the ROR? I mean, if you're going to design an instruction do accelerate this sort of calculation, wouldn't you go all the way? – Russ Schultz Jul 21 '16 at 21:15
1

@RussSchultz The instruction wouldn't work at all without the swap. You'd be adding/subtracting the the same components with each other. So you'd get either `(L-R):0` or `0:(L+R)`. If you don't mind reversing the phase you can use SHSAX and get -R:-L without the need to use ROR. – Ross Ridge Jul 21 '16 at 21:27
@RossRidge, why would one need ROR here? Is something wrong with R1=L:R? – Valera Grishin Feb 22 '22 at 23:02

Purpose of ARM parallel instructions ASX and SAX?

ASX

SAX

1 Answers1