Questions tagged [neon]

NEON is a vector-processing instruction set for ARM processors. Please use this tag together with [arm] if asking about the AArch32 version of NEON (to run on 32-bit ARM processors), or [arm64] for AArch64. See also the [simd] tag.

NEON is a vector-processing instruction set for ARM processors. It's also known as Advanced SIMD (Single Instruction Multiple Data).

NEON can be used on either 32-bit or 64-bit ARM processors, as part of the AArch32 or AArch64 architectures respectively. However, there are significant differences between the AArch32 and AArch64 versions of NEON (register usage, instruction mnemonics, instruction availability), so please use this tag together with either for AArch32, or for AArch64.

The tag may also be appropriate, especially for questions about SIMD algorithms that may be implemented with NEON.

Don't forget to include a tag for the programming language you are coding in, perhaps , or . In the latter cases, consider the tags or for how you access the instructions.

More information at

  1. Neon page in ARM website
  2. Wikipedia article on ARM
885 questions
0
votes
1 answer

How to load uint8_t *src to uint16x8_t

How to load uint8_t *src to uint16x8_t? For example, we can only do the following: uint8_t *src; ---> uint8x8_t mysrc = vld1_u8(src); Seems that I can not use vreinterpret_*() or (uint16x8_t)mysrc to transform mysrc to uint16x8_t? Is it right?
BonderWu
  • 133
  • 1
  • 10
0
votes
1 answer

Should we consider for overflow when use Neon intrinsics such as vadd_s8

If we have such C code spatial_pred= (cur[mrefs] + cur[prefs])>>1; when transform to Neon intrinsics int8x8_t cur_mrefs = vld1_s8(cur+mrefs); int8x8_t cur_prefs = vld1_s8(cur+prefs); int8x8_t spatial_pred = vshr_n_s8(vadd_s8(cur_mrefs, cur_prefs),…
BonderWu
  • 133
  • 1
  • 10
0
votes
2 answers

Does anybody know how to use Neon intrinsics uint8x8_t vclt_s8 (int8x8_t, int8x8_t)

I want to compare 2 int8x8_t, From http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html we can get the description for vclt_s8, but it does not tell us much details. `uint8x8_t vclt_s8 (int8x8_t, int8x8_t)` Form of expected instruction(s):…
BonderWu
  • 133
  • 1
  • 10
0
votes
1 answer

Filling the frame buffer in external ram is very slow my embedded system

I am updating frame buffer in the external ram as and when I get the character codes from the UART by referring a font data base. The frame buffer size is around 600kb and it takes around 1.5 seconds to fill it completely without using DMA. The…
Akshay
  • 11
0
votes
1 answer

SSE (Intel) to NEON (ARM) data type analogs

I have a few intrinsic datatypes __m128, __m128i that have either been on the left side of the assignment OR as parameters. I am in the process of converting the SSE code to NEON (for deployment on iOS) but I am unable to find the analogous data…
p0lAris
  • 4,750
  • 8
  • 45
  • 80
0
votes
1 answer

ARM NEON Optimization for image transformation

I'm applying an NV12 video transformation which shuffles pixels of the video. On an ARM device such as Google Nexus 7 2013, performance is pretty bad at 30fps for a 1024x512 area with the following C code: * Pre-processing done only once at…
gregoiregentil
  • 1,793
  • 1
  • 26
  • 56
0
votes
1 answer

How to use neon intrinsic on eclipse CDT?

I am using arm neon intrinsic in my c project on eclipse CDT. But it always show some "Type XXX could not be resolved" error even when I included arm_neon.h as the library. eg: Type 'uint8x8_t' could not be resolved. Type 'uint8x16x4_t' could not…
user2002993
  • 1,481
  • 1
  • 9
  • 9
0
votes
1 answer

Calc atan2 with neon

I have been found a lib but there was not void atan2fv_neon_hfp(float *y, float *x,float *res,int len) to calculate len floats once. How can I write a neon version for atan2fv_neon_hfp ?
0
votes
1 answer

ARM Neon assembler + C how can I pass and use array of pointers

I have a C function and I want to load data from array of pointers passed to assembler part. How to do this? float *pointerToBuffer asm volatile ( "vld1.32 {q0},[%[buf]] \n\t" : [buf]"+r"(ponterToBuffer) ); What if the variable was…
0
votes
2 answers

NEON acceleration for 12-bit to 8-bit

I have a buffer of 12-bit data (stored in 16-bit data) and need to converts into 8-bit (shift by 4) How can the NEON accelerate this processing ? Thank you for your help Brahim
bhamadicharef
  • 360
  • 1
  • 11
0
votes
1 answer

About arm neon compiling

Some of my code reference a library which use arm_neon.h; when I tried to compile using "Simulator", I received a bunch of errors. I am using LLVM 4.2 compiler, what should I do to get it compiled with arm neon?
Adam Lee
  • 24,710
  • 51
  • 156
  • 236
0
votes
2 answers

Optimizing a scanline conversion function for ARM

The code below converts a row from an 8-Bit paletized format to 32-RGBA. Before I trying to implement it, I would like to know if the code below is even suited for being optimized with Direct-Math or alternatively ARM Neon intrinsics or inline…
Oliver Weichhold
  • 10,259
  • 5
  • 45
  • 87
0
votes
1 answer

Use C variables in ARM Neon assembly

I've a problem using C/C++ variables inside ARM NEON assembly code written in: __asm__ __volatile() I've read about the following possibilities, which should move values from ARM to NEON registers. Each of the following possibilities cause a Fatal…
Alessandro Gaietta
  • 557
  • 2
  • 9
  • 20
0
votes
1 answer

Doing "uint8x8x4_t - 128" then divising this by 2

I'm a bit mixed up about how to achieve a division by a scalar on Neon in a specific case. In a c++ context, I'm achieving a contrast effect with a very rudimentary algorithm: if (currentEffect == "contrast_with_cpp") { r += ((r - 128) / 2); …
Léon Pelletier
  • 2,701
  • 2
  • 40
  • 67
0
votes
1 answer

Eigen not vectorizing matrix multiplication in iOS?

I'm using the Eigen library to do some computation on an iPad 2. (ie. cortex-a9). It seems that some operations are vectorized using NEON instructions, while others aren't. Operations that I've tried that get vectorized: dot products, vector and…
user1906
  • 2,310
  • 2
  • 20
  • 37