Questions tagged [neon]

NEON is a vector-processing instruction set for ARM processors. Please use this tag together with [arm] if asking about the AArch32 version of NEON (to run on 32-bit ARM processors), or [arm64] for AArch64. See also the [simd] tag.

NEON is a vector-processing instruction set for ARM processors. It's also known as Advanced SIMD (Single Instruction Multiple Data).

NEON can be used on either 32-bit or 64-bit ARM processors, as part of the AArch32 or AArch64 architectures respectively. However, there are significant differences between the AArch32 and AArch64 versions of NEON (register usage, instruction mnemonics, instruction availability), so please use this tag together with either arm for AArch32, or arm64 for AArch64.

The simd tag may also be appropriate, especially for questions about SIMD algorithms that may be implemented with NEON.

Don't forget to include a tag for the programming language you are coding in, perhaps assembly, c or c++. In the latter cases, consider the tags intrinsics or inline-assembly for how you access the instructions.

More information at

885 questions

votes

2 answers

How to enable Neon instruction in Xcode

I want to use Neon SIMD instruction for the iphone. I heard we have to put flags "-mfloat-abi=softfp -mfpu=neon" in the "Other C Flags" field of the Target inspector, but when building I get "error: unrecognized command line option "-mfpu=neon""…

iphone xcode neon

asked Mar 04 '10 at 01:31

Krav

votes

2 answers

Fastest Inverse Square Root on iPhone

I'm working on an iPhone app that involves certain physics calculations that are done thousands of times per second. I am working on optimizing the code to improve the framerate. One of the pieces that I am looking at improving is the inverse…

ios objective-c optimization physics neon

asked Jan 10 '14 at 07:41

WolfLink

3,308
2
26
44

votes

2 answers

LSB to MSB bit reversal on ARM

I need to reverse an YUV image with each byte in LSB instead of MSB. I have read Best Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C but I would like to do something that is ARM-optimized. int8 *image; for(i = 0; i < size; i++) { …

arm bit-manipulation neon

asked Dec 07 '13 at 01:32

gregoiregentil

1,793
1
26
56

votes

1 answer

Constant out of range with NEON intrinsics

Im compiling the following ARM NEON intrinsics test code (in Eclipse with Android NDK): void foo(uint64_t* Res) { uint64_t x = 0xff12aa8902acf78dULL; uint64x1_t a,b; a = vld1_u64 (&x); b = vext_u64 (a, a, 3); vst1_u64…

c compiler-errors android-ndk arm neon

asked Mar 09 '13 at 19:11

NumberFour

3,551
8
48
72

votes

2 answers

neon float multiplication is slower than expected

I have two tabs of floats. I need to multiply elements from the first tab by corresponding elements from the second tab and store the result in a third tab. I would like to use NEON to parallelize floats multiplications: four float multiplications…

c++ gcc arm simd neon

asked Sep 14 '12 at 07:35

tomto

votes

3 answers

Add all elements in a lane

Is there an intrinsic which allows one to add all of the elements in a lane? I am using Neon to multiply 8 numbers together, and I need to sum the result. Here is some paraphrased code to show what I'm currently doing (this could probably be…

c arm simd neon

asked Aug 29 '12 at 04:55

NOP

votes

4 answers

Efficient floating point comparison (Cortex-A8)

There is a big (~100 000) array of floating point variables, and there is a threshold (also floating point). The problem is that I have to compare each one variable from the array with a threshold, but NEON flags transfer takes a really long time…

c++ c neon cortex-a8 arm7

asked Apr 30 '12 at 10:12

Alex

9,891
11
53
87

votes

1 answer

ARM NEON: comparing 128 bit values

I'm interested in finding the fastest way (lowest cycle count) of comparing the values stored into NEON registers (say Q0 and Q3) on a Cortex-A9 core (VFP instructions allowed). So far I have the following: (1) Using the VFP floating point…

arm vectorization simd neon

asked Jan 30 '12 at 18:38

Mircea

1,841
15
18

votes

2 answers

ARM Cortex A8 Benchmarks: can someone help me make sense of these numbers?

I'm working on writing several real-time DSP algorithms on Android, so I decided to program the ARM directly in Assembly to optimize everything as much as possible and make the math maximally lightweight. At first I was getting speed benchmarks that…

assembly arm benchmarking neon cortex-a8

asked Nov 08 '11 at 17:04

Phonon

12,549
13
64
114

votes

3 answers

Efficient C vectors for generic SIMD (SSE, AVX, NEON) test for zero matches. (find FP max absolute value and index)

I want to see if it's possible to write some generic SIMD code that can compile efficiently. Mostly for SSE, AVX, and NEON. A simplified version of the problem is: Find the maximum absolute value of an array of floating point numbers and return…

c gcc simd sse neon

asked Jan 07 '22 at 23:59

TrentP

4,240
24
35

votes

1 answer

uint8 to float using SIMD Neon intrinsics

I'm trying to optimize my code that converts grayscale images to float images which runs on Neon A64/v8. The current implementation is quite fast using OpenCV's convertTo() (that compiled for android), but this is still our bottleneck. So I came up…

c++ c simd intrinsics neon

asked Aug 23 '20 at 13:41

Chen

votes

3 answers

Neon Optimization using intrinsics

Learning about ARM NEON intrinsics, I was timing a function that I wrote to double the elements in an array.The version that used the intrinsics takes more time than a plain C version of the function. Without NEON : void …

arm neon cortex-a8

asked Apr 19 '11 at 13:21

itisravi

3,406
3
23
30

votes

3 answers

Is numpy optimized for raspberry-pi automatically

The Raspberry Pi ( armv7l architecture ) has neon vfpv4 support which can be used for optimization. Does the standard version of numpy include these optimizations when installing the command pip3 install numpy or apt-get python3-numpy? I am not…

numpy optimization raspberry-pi arm neon

asked Sep 04 '18 at 07:32

Dan Erez

1,364
15
16

votes

1 answer

What exact difference is between NEON and SIMD instructions in cortex M7

As per my understanding by referring to many links to ARM's site I understand Cortex-M7 doesn't support NEON instructions, but the host (CORTEX-M7) processor that we are using in our organization specifies "ARM Cortex-M7 with single precision…

arm simd terminology neon cortex-m

asked Jul 03 '17 at 12:02

harishchandra manchikanti

votes

0 answers

Hardware optimizations using Qualcomm Snapdragon 800 and Adreno 330

I am developing a real-time computer vision project that runs on an Ubuntu (Linaro) board with an ARM CPU (Snapdragon 800). Some parts of the software operate on HD images, huge amount of data. This slows the execution and acts as a…

opencv opencl neon flann linaro

asked Aug 28 '16 at 08:42

avi123

Prev 1 2 3

…

58 59 Next