Questions tagged [neon]

NEON is a vector-processing instruction set for ARM processors. Please use this tag together with [arm] if asking about the AArch32 version of NEON (to run on 32-bit ARM processors), or [arm64] for AArch64. See also the [simd] tag.

NEON is a vector-processing instruction set for ARM processors. It's also known as Advanced SIMD (Single Instruction Multiple Data).

NEON can be used on either 32-bit or 64-bit ARM processors, as part of the AArch32 or AArch64 architectures respectively. However, there are significant differences between the AArch32 and AArch64 versions of NEON (register usage, instruction mnemonics, instruction availability), so please use this tag together with either arm for AArch32, or arm64 for AArch64.

The simd tag may also be appropriate, especially for questions about SIMD algorithms that may be implemented with NEON.

Don't forget to include a tag for the programming language you are coding in, perhaps assembly, c or c++. In the latter cases, consider the tags intrinsics or inline-assembly for how you access the instructions.

More information at

885 questions

votes

1 answer

How to load uint8_t *src to uint16x8_t

How to load uint8_t *src to uint16x8_t? For example, we can only do the following: uint8_t *src; ---> uint8x8_t mysrc = vld1_u8(src); Seems that I can not use vreinterpret_*() or (uint16x8_t)mysrc to transform mysrc to uint16x8_t? Is it right?

arm neon intrinsics

asked Dec 10 '13 at 10:47

BonderWu

votes

1 answer

Should we consider for overflow when use Neon intrinsics such as vadd_s8

If we have such C code spatial_pred= (cur[mrefs] + cur[prefs])>>1; when transform to Neon intrinsics int8x8_t cur_mrefs = vld1_s8(cur+mrefs); int8x8_t cur_prefs = vld1_s8(cur+prefs); int8x8_t spatial_pred = vshr_n_s8(vadd_s8(cur_mrefs, cur_prefs),…

arm neon

asked Dec 06 '13 at 14:45

BonderWu

votes

2 answers

Does anybody know how to use Neon intrinsics uint8x8_t vclt_s8 (int8x8_t, int8x8_t)

I want to compare 2 int8x8_t, From http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html we can get the description for vclt_s8, but it does not tell us much details. `uint8x8_t vclt_s8 (int8x8_t, int8x8_t)` Form of expected instruction(s):…

arm simd neon intrinsics

asked Dec 05 '13 at 01:59

BonderWu

votes

1 answer

Filling the frame buffer in external ram is very slow my embedded system

I am updating frame buffer in the external ram as and when I get the character codes from the UART by referring a font data base. The frame buffer size is around 600kb and it takes around 1.5 seconds to fill it completely without using DMA. The…

embedded arm framebuffer neon omap

asked Oct 28 '13 at 08:37

Akshay

votes

1 answer

SSE (Intel) to NEON (ARM) data type analogs

I have a few intrinsic datatypes __m128, __m128i that have either been on the left side of the assignment OR as parameters. I am in the process of converting the SSE code to NEON (for deployment on iOS) but I am unable to find the analogous data…

ios arm vectorization sse neon

asked Oct 24 '13 at 03:06

p0lAris

4,750
8
45
80

votes

1 answer

ARM NEON Optimization for image transformation

I'm applying an NV12 video transformation which shuffles pixels of the video. On an ARM device such as Google Nexus 7 2013, performance is pretty bad at 30fps for a 1024x512 area with the following C code: * Pre-processing done only once at…

image-processing arm neon

asked Oct 18 '13 at 04:56

gregoiregentil

1,793
1
26
56

votes

1 answer

How to use neon intrinsic on eclipse CDT?

I am using arm neon intrinsic in my c project on eclipse CDT. But it always show some "Type XXX could not be resolved" error even when I included arm_neon.h as the library. eg: Type 'uint8x8_t' could not be resolved. Type 'uint8x16x4_t' could not…

android eclipse eclipse-cdt neon

asked Oct 17 '13 at 08:14

user2002993

1,481
1
9
9

votes

1 answer

Calc atan2 with neon

I have been found a lib but there was not void atan2fv_neon_hfp(float *y, float *x,float *res,int len) to calculate len floats once. How can I write a neon version for atan2fv_neon_hfp ?

c arm simd neon atan2

asked Sep 18 '13 at 09:42

WateLemon

votes

1 answer

ARM Neon assembler + C how can I pass and use array of pointers

I have a C function and I want to load data from array of pointers passed to assembler part. How to do this? float *pointerToBuffer asm volatile ( "vld1.32 {q0},[%[buf]] \n\t" : [buf]"+r"(ponterToBuffer) ); What if the variable was…

ios assembly arm neon

asked Sep 18 '13 at 09:36

user1132968

votes

2 answers

NEON acceleration for 12-bit to 8-bit

I have a buffer of 12-bit data (stored in 16-bit data) and need to converts into 8-bit (shift by 4) How can the NEON accelerate this processing ? Thank you for your help Brahim

image-processing compiler-optimization neon

asked Sep 10 '13 at 14:25

bhamadicharef

votes

1 answer

About arm neon compiling

Some of my code reference a library which use arm_neon.h; when I tried to compile using "Simulator", I received a bunch of errors. I am using LLVM 4.2 compiler, what should I do to get it compiled with arm neon?

ios xcode arm neon

asked Aug 12 '13 at 12:41

Adam Lee

24,710
51
156
236

votes

2 answers

Optimizing a scanline conversion function for ARM

The code below converts a row from an 8-Bit paletized format to 32-RGBA. Before I trying to implement it, I would like to know if the code below is even suited for being optimized with Direct-Math or alternatively ARM Neon intrinsics or inline…

arm neon directxmath

asked Jul 27 '13 at 05:03

Oliver Weichhold

10,259
5
45
87

votes

1 answer

Use C variables in ARM Neon assembly

I've a problem using C/C++ variables inside ARM NEON assembly code written in: __asm__ __volatile() I've read about the following possibilities, which should move values from ARM to NEON registers. Each of the following possibilities cause a Fatal…

c assembly arm neon

asked Jul 18 '13 at 10:52

Alessandro Gaietta

votes

1 answer

Doing "uint8x8x4_t - 128" then divising this by 2

I'm a bit mixed up about how to achieve a division by a scalar on Neon in a specific case. In a c++ context, I'm achieving a contrast effect with a very rudimentary algorithm: if (currentEffect == "contrast_with_cpp") { r += ((r - 128) / 2); …

neon

asked Jun 25 '13 at 03:31

Léon Pelletier

2,701
2
40
67

votes

1 answer

Eigen not vectorizing matrix multiplication in iOS?

I'm using the Eigen library to do some computation on an iPad 2. (ie. cortex-a9). It seems that some operations are vectorized using NEON instructions, while others aren't. Operations that I've tried that get vectorized: dot products, vector and…

ios eigen neon

asked Jun 10 '13 at 13:22

user1906

2,310
2
20
37

Prev 1 2 3

…

58 59 Next