10

I would like to realize if condition in armv8 NEON inline assembly code. In armv7 this was possible through checking overflow bit like this:

  VMRS r4, FPSCR            
  BIC r4, r4, #(1<<27)      
  VMSR FPSCR, r4     

  vtst.16  d30, d30, d30    
  vqadd.u16  d30, d30, d30 

  vmrs r4, FPSCR            
  tst  r4, #(1<<27)         
  bne label1

But I am not able to achieve this in armv8 equivalent code. It seems that SQADD doesnt affect overflow bit in FPSR or I cannot check it like this. Is it possible or is there better approach how to skip long part of code?

Thank you

RanL
  • 139
  • 9
  • What do you mean by "overflow bit in FPCR"? FPCR is the control register; status bits are in FPSR. What's your A64 code doing currently? – Notlikethat Jul 01 '16 at 12:54
  • I meant FPSR. My code is doing something like: load pixels from image into NEON register, do some computation on this register, then check if at least one pixel is not 0. If this condition is true, than do a lot of instructions on this NEON register. Otherwise continue with new load... – RanL Jul 01 '16 at 14:03
  • Well, the pseudocode in the ARM ARM for `sqadd` certainly says it sets FPSR.QC if saturation occurs. Of course, looking again at the A32 code, if you're expecting -1 + -1 to overflow a _signed_ type, that's a different matter... – Notlikethat Jul 01 '16 at 14:13
  • 4
    Is this as simple as you looking for `UQADD` rather than `SQADD`? Your AArch32 code uses `vqadd.u16`, i.e. looking for unsigned saturation rather than signed saturation. – James Greenhalgh Jul 03 '16 at 18:28
  • Show us what you tried with ARMv8 code – BitBank Jul 04 '16 at 12:20
  • Did you manage to do this? I would like to do the same (check if at least one pixel is not 0). On ARMv7-a, I was doing vcmp.f64 d30, #0; vmrs APSR_nzcv, fpscr; beq .jump. What would be the equivalent on ARMV8-a? – gregoiregentil Mar 24 '17 at 06:21
  • Yes, I used the same approach as for arm v7, try it "MRS x5, FPSR "BIC x5, x5, #(1<<27) "MSR FPSR, x5 "CMTST v1.16B, v1.16B, v1.16B "UQADD v1.16B, v1.16B, v1.16B "mrs x4, FPSR "tst x4, #(1<<27) "beq label2 – RanL Apr 05 '17 at 08:23

1 Answers1

1

The same information is available in Aarch64. You just need to replace:

VMSR r4, FPSCR VMRS FPSCR, r4

by:

MRS w4, FPSR MSR FPSR, w4

Dric512
  • 3,525
  • 1
  • 20
  • 27