Questions tagged [neon]

NEON is a vector-processing instruction set for ARM processors. Please use this tag together with [arm] if asking about the AArch32 version of NEON (to run on 32-bit ARM processors), or [arm64] for AArch64. See also the [simd] tag.

NEON is a vector-processing instruction set for ARM processors. It's also known as Advanced SIMD (Single Instruction Multiple Data).

NEON can be used on either 32-bit or 64-bit ARM processors, as part of the AArch32 or AArch64 architectures respectively. However, there are significant differences between the AArch32 and AArch64 versions of NEON (register usage, instruction mnemonics, instruction availability), so please use this tag together with either for AArch32, or for AArch64.

The tag may also be appropriate, especially for questions about SIMD algorithms that may be implemented with NEON.

Don't forget to include a tag for the programming language you are coding in, perhaps , or . In the latter cases, consider the tags or for how you access the instructions.

More information at

  1. Neon page in ARM website
  2. Wikipedia article on ARM
885 questions
15
votes
4 answers

How to check the existence of NEON on arm?

How to determine whether NEON engine exists on given ARM processor? Any status/flag register can be queried for such purpose?
Thomson
  • 20,586
  • 28
  • 90
  • 134
15
votes
2 answers

Fast sine/cosine for ARMv7+NEON: looking for testers…

Could somebody with access to an iPhone 3GS or a Pandora please test the following assembly routine I just wrote? It is supposed to compute sines and cosines really really fast on the NEON vector FPU. I know it compiles fine, but without adequate…
jcayzac
  • 1,441
  • 1
  • 13
  • 26
14
votes
3 answers

What is Neon with respect to Android?

I'm a beginner in Android. My friend heard "Neon". So I did Google and found this Referring it, Neon is related to multimedia for Android OS or all mobile OS, is it? Please share me more.
soclose
  • 2,773
  • 12
  • 51
  • 60
14
votes
2 answers

Is there an API to detect CPU features on iOS?

I have some cryptography code that has multiple implementations, selecting which implementation at runtime based on the features of the CPU it is running on. Porting this has been straightforward so far, with Windows, Linux and Android being…
Myria
  • 3,372
  • 1
  • 24
  • 42
13
votes
6 answers

Android build system, NEON and non-NEON builds

I want to build my library for armv6, and there is some neon code that I enable at runtime if the device supports it. The neon code uses neon intrinsics, and to be able to compile it, I must enable armeabi-v7a, but that affects regular c-code (it…
Pavel P
  • 15,789
  • 11
  • 79
  • 128
13
votes
3 answers

Performance of unaligned SIMD load/store on aarch64

An older answer indicates that aarch64 supports unaligned reads/writes and has a mention about performance cost, but it's unclear if the answer covers only the ALU or SIMD (128-bit register) operations, too. Relative to aligned 128-bit NEON loads…
hsivonen
  • 7,908
  • 1
  • 30
  • 35
12
votes
4 answers

Arm Neon Intrinsics vs hand assembly

https://web.archive.org/web/20170227190422/http://hilbert-space.de/?p=22 On this site which is quite dated it shows that hand written asm would give a much greater improvement then the intrinsics. I am wondering if this is the current truth even now…
George Host
  • 980
  • 1
  • 12
  • 26
12
votes
1 answer

NEON vs Intel SSE - equivalence of certain operations

I'm having some trouble figuring out the NEON equivalence of a couple of Intel SSE operations. It seems that NEON is not capable to handle an entire Q register at once(128 bit value data type). I haven't found anything in the arm_neon.h header or in…
celavek
  • 5,575
  • 6
  • 41
  • 69
12
votes
3 answers

Does ARM sit idle while NEON is doing its operations?

Might look similar to: ARM and NEON can work in parallel?, but its not, I have some other issue ( may be problem with my understanding): In the protocol stack, while we compute checksum, that is done on the GPP, I’m handing over that task now to…
nguns
  • 440
  • 6
  • 21
11
votes
3 answers

Fastest way of bitwise AND between two arrays on iPhone?

I have two image blocks stored as 1D arrays and have do the following bitwise AND operations among the elements of them. int compare(unsigned char *a, int a_pitch, unsigned char *b, int b_pitch, int a_lenx, int a_leny) { int…
wlee
  • 111
  • 4
11
votes
4 answers

How do I reorder vector data using ARM Neon intrinsics?

This is specifically related to ARM Neon SIMD coding. I am using ARM Neon instrinsics for certain module in a video decoder. I have a vectorized data as follows: There are four 32 bit elements in a Neon register - say, Q0 - which is of size 128 bit.…
goldenmean
  • 18,376
  • 54
  • 154
  • 211
11
votes
1 answer

ARM NEON: How to implement a 256bytes Look Up table

I am porting some code I wrote to NEON using inline assembly. One of the things I need is to convert byte values ranging [0..128] to other byte values in a table which take the full range [0..255] The table is short but the math behind this is not…
Jordi C.
  • 339
  • 2
  • 12
10
votes
1 answer

Why does arm-gcc decrement/increment the stack pointer even when the stack is never accessed?

When compiling this program with arm-elf-gcc-4.5 -O3 -march=armv7-a -mthumb -mfpu=neon -mfloat-abi=softfp: #include extern float32x4_t cross(const float32x4_t& v1, const float32x4_t& v2) { float32x4x2_t xxyyzz1(vzipq_f32(v1,…
jcayzac
  • 1,441
  • 1
  • 13
  • 26
10
votes
1 answer

armv8 NEON if condition

I would like to realize if condition in armv8 NEON inline assembly code. In armv7 this was possible through checking overflow bit like this: VMRS r4, FPSCR BIC r4, r4, #(1<<27) VMSR FPSCR, r4 vtst.16 d30, d30, d30 …
RanL
  • 139
  • 9
10
votes
2 answers

Determine FLOPS of our ASM program

We had to implement an ASM program for multiplying sparse matrices in the coordinate scheme format (COOS) as well as in the compressed row format (CSR). Now that we have implemented all these algorithms we want to know how much more performant they…
tzwickl
  • 1,341
  • 2
  • 15
  • 31
1
2
3
58 59