Questions tagged [neon]

NEON is a vector-processing instruction set for ARM processors. Please use this tag together with [arm] if asking about the AArch32 version of NEON (to run on 32-bit ARM processors), or [arm64] for AArch64. See also the [simd] tag.

NEON is a vector-processing instruction set for ARM processors. It's also known as Advanced SIMD (Single Instruction Multiple Data).

NEON can be used on either 32-bit or 64-bit ARM processors, as part of the AArch32 or AArch64 architectures respectively. However, there are significant differences between the AArch32 and AArch64 versions of NEON (register usage, instruction mnemonics, instruction availability), so please use this tag together with either arm for AArch32, or arm64 for AArch64.

The simd tag may also be appropriate, especially for questions about SIMD algorithms that may be implemented with NEON.

Don't forget to include a tag for the programming language you are coding in, perhaps assembly, c or c++. In the latter cases, consider the tags intrinsics or inline-assembly for how you access the instructions.

More information at

885 questions

votes

1 answer

SIMD optimization of cvtColor using ARM NEON intrinsics

I'm working on a SIMD optimization of BGR to grayscale conversion which is equivalent to OpenCV's cvtColor() function. There is an Intel SSE version of this function and I'm referring to it. (What I'm doing is basically converting SSE code to NEON…

c++ opencv arm sse neon

asked Jul 27 '14 at 02:08

S.Sato

votes

4 answers

iPhone detecting processor model / NEON support

I'm looking for a way to differentiate at runtime between devices equipped with the new ARM processor (such as iPhone 3GS and some iPods 3G) and devices equipped with the old ARM processors. I know I can use uname() to determine the device model,…

iphone arm ipod-touch neon

asked Oct 21 '09 at 13:42

yonilevy

5,320
3
31
27

votes

6 answers

SSE _mm_movemask_epi8 equivalent method for ARM NEON

I decided to continue Fast corners optimisation and stucked at _mm_movemask_epi8 SSE instruction. How can i rewrite it for ARM Neon with uint8x16_t input?

arm sse neon

asked Aug 08 '12 at 18:33

inspirit

votes

1 answer

How to stop GCC from breaking my NEON intrinsics?

I need to write optimized NEON code for a project and I'm perfectly happy to write assembly language, but for portability/maintainability I'm using NEON instrinsics. This code needs to be as fast as possible, so I'm using my experience in ARM…

c gcc arm neon intrinsics

asked Jan 20 '16 at 13:53

BitBank

8,500
3
28
46

votes

1 answer

How to initialize const float32x4x4_t (ARM NEON intrinsic, GCC)?

I can initialize float32x4_t like this: const float32x4x4_t zero = { 0.0f, 0.0f, 0.0f, 0.0f }; But this code makes an error Incompatible types in initializer: const float32x4x4_t one = { 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, …

c gcc struct constants neon

asked May 01 '10 at 12:11

eonil

83,476
81
317
516

votes

1 answer

How do Android programs make use of NEON SIMD?

I've been learning up a little on the cpu features and stumbled upon NEON. From what I've read, it looks like NEON requires specific programming to use this, but is this completely true, or do the cpus that have this feature still find ways to…

android neon

asked Jul 17 '12 at 02:05

Tam

1,189
10
15

votes

5 answers

Optimizing RGBA8888 to RGB565 conversion with NEON

I'm trying to optimize an image format conversion on iOS using the NEON vector instruction set. I assumed this would map well to that because it processes a bunch of similar data. My attempts haven't gone that well, though, achieving only a marginal…

iphone ios assembly arm neon

asked Oct 10 '11 at 00:49

Andrew Pouliot

5,423
1
30
34

votes

1 answer

sse/avx equivalent for neon vuzp

Intel's vector extensions SSE, AVX, etc. provide two unpack operations for each element size, e.g. SSE intrinsics are _mm_unpacklo_* and _mm_unpackhi_*. For 4 elements in a vector, it does this: inputs: (A0 A1 A2 A3) (B0 B1 B2 B3) unpacklo/hi:…

sse simd neon avx

asked Jul 28 '17 at 14:36

Ralf

1,203
1
11
20

votes

3 answers

128bit hash comparison with SSE

In my current project, I have to compare 128bit values (actually md5 hashes) and I thought it would be possible to accelerate the comparison by using SSE instructions. My problem is that I can't manage to find good documentation on SSE…

c assembly inline-assembly sse neon

asked Dec 26 '10 at 14:48

fokenrute

votes

1 answer

ARM Cortex-A8: How to make use of both NEON and vfpv3

I'm using Cortex-A8 processor and I'm not understanding how to use the -mfpu flag. On the Cortex-A8 there are both vfpv3 and neon co-processors. Previously I was not knowing how to use neon so I was only using gcc -marm -mfloat-abi=softfp…

arm neon compiler-flags cortex-a8

asked Nov 18 '10 at 10:02

HaggarTheHorrible

7,083
20
70
81

votes

2 answers

Optimizing horizontal boolean reduction in ARM NEON

I'm experimenting with a cross-platform SIMD library ala ecmascript_simd aka SIMD.js, and part of this is providing a few "horizontal" SIMD operations. In particular, the API that library offers includes any() -> bool and all()…

arm simd neon

asked Jul 03 '15 at 01:40

huon

94,605
21
231
225

votes

2 answers

(opencv rc1) What causes Mat multiplication to be 20x slower than per-pixel multiplication?

// 700 ms cv::Mat in(height,width,CV_8UC1); in /= 4; Replaced with //40 ms cv::Mat in(height,width,CV_8UC1); for (int y=0; y < in.rows; ++y) { unsigned char* ptr = in.data + y*in.step1(); for (int x=0; x < in.cols; ++x) { ptr[x]…

c++ opencv java-native-interface arm neon

asked May 11 '15 at 11:55

Boyko Perfanov

3,007
18
34

votes

3 answers

Compacting data in buffer from 16 bit per element to 12 bits

I'm wondering if there is any chance to improve performance of such compacting. The idea is to saturate values higher than 4095 and place each value every 12 bits in new continuous buffer. Just like that: Concept: Convert: Input buffer:…

c arm simd neon

asked Jun 17 '14 at 07:54

Piotr Nowak

votes

1 answer

NEON intrinsic types work in C but throw invalid arguments error in C++

I have problems with using NEON intrinsics and inline assembly in Android NDK. NEON types like float32x4_t give an "invalid arguments" error when compiling C++ code with GCC 4.6 and 4.8, however, the code compiles fine if compiled as C. For example,…

android c++ android-ndk neon intrinsics

asked Aug 27 '13 at 18:55

Triang3l

1,230
9
29

votes

1 answer

Maximum optimization of element wise multiplication via ARM NEON assembly

I'm optimizing an element wise multiplication of two single dimensional arrays for a dual Cortex-A9 processor. Linux is running on the board and I'm using the GCC 4.5.2 compiler. So the following is my C++ inline assembler function. src1, src2 and…

c++ optimization assembly arm neon

asked Oct 08 '12 at 07:54

HyraxK

Prev 1 2

…

58 59 Next