Questions tagged [neon]

NEON is a vector-processing instruction set for ARM processors. Please use this tag together with [arm] if asking about the AArch32 version of NEON (to run on 32-bit ARM processors), or [arm64] for AArch64. See also the [simd] tag.

NEON is a vector-processing instruction set for ARM processors. It's also known as Advanced SIMD (Single Instruction Multiple Data).

NEON can be used on either 32-bit or 64-bit ARM processors, as part of the AArch32 or AArch64 architectures respectively. However, there are significant differences between the AArch32 and AArch64 versions of NEON (register usage, instruction mnemonics, instruction availability), so please use this tag together with either for AArch32, or for AArch64.

The tag may also be appropriate, especially for questions about SIMD algorithms that may be implemented with NEON.

Don't forget to include a tag for the programming language you are coding in, perhaps , or . In the latter cases, consider the tags or for how you access the instructions.

More information at

  1. Neon page in ARM website
  2. Wikipedia article on ARM
885 questions
0
votes
1 answer

ARM NEON not giving accurate resuls

I am trying to optimize a raytracer code on beagleboard and for that I am using the NEON coprocessor. There is a matrix multiplication function that is called multiple times which I have written in inline assembly. However, for some reason the…
fussy
  • 37
  • 9
0
votes
1 answer

How to compute dotproduct with NEON and double values in ARM processor

I need to do a lot of vector calculations. Therefore I seems wise that NEON should be used. The problem is that the function depends on doubles. This gives me two options, re-writing the entire code so that it works with floats, or creating a…
Alex van Rijs
  • 803
  • 5
  • 17
  • 39
0
votes
0 answers

ARM NEON count compare result

I need to make some parallel compare under uint16x8_t vectors, and increment some local variable (counter) according to it, for example +8 increment, if all elements of vector compared as true. I implement this algorithm: ... register int objects =…
exbluesbreaker
  • 2,160
  • 3
  • 18
  • 30
0
votes
1 answer

NEON output generated by the simulator regarding (pipeline information, stalls, execution cycles) not clear

I have some problem understanding the output of NEON simulator. The output generated is cryptic and there is no proper documentation for understanding the simulator output. for example : In the above figure the 1st column's information is not…
Aurum
  • 77
  • 12
0
votes
2 answers

Summing 3 vectors and get the result in neon

I'm trying to sum d0,d1,d2 + d3,d4,d5+ d6,d7,d8. I don't know the best instruction for that and then take the average by 9. I know how to do the averaging using approximation, but summing those lanes, I can't find an instruction for that ? I also…
andre_lamothe
  • 2,171
  • 2
  • 41
  • 74
0
votes
1 answer

algorithm for downsample an image by 3 using Neon

I would like to know is it possible to with neon vectors to downsample an image by 3 ? I'm trying to write an algorithm for that on paper, but it seems it is not possible. Because when you get for example 8 bytes, you can not get 3*3pixels, there…
andre_lamothe
  • 2,171
  • 2
  • 41
  • 74
0
votes
2 answers

Bilinear Interpolation from C to Neon

I'm trying to downsample an Image using Neon. So I tried to exercise neon by writing a function that subtracts two images using neon and I have succeeded. Now I came back to write the bilinear interpolation using neon intrinsics. Right now I have…
andre_lamothe
  • 2,171
  • 2
  • 41
  • 74
0
votes
1 answer

Explaining ARM Neon Image Sampling

I'm trying to write a better version of cv::resize() of the OpenCV, and I came a cross a code that is here: https://github.com/rmaz/NEON-Image-Downscaling/blob/master/ImageResize/BDPViewController.m The code is for downsampling an image by 2 but I…
andre_lamothe
  • 2,171
  • 2
  • 41
  • 74
0
votes
1 answer

how to program neon register index

I have a 8x8 data. After processing, I want to keep resulting 8x8 data for time being for further process. My question is if it is possible to program 4 Q-registers to store them by loop. But the following code doesn't compile, I also like to…
Tom
  • 121
  • 1
  • 1
  • 5
0
votes
2 answers

Anyway to use variable in register name in NEON?

NEON extension registers can be viewed as 16 quadwords or 32 doublewords. In most programming, the specific register to be used is fixed. For example, vmov.i8 d0, 0xff vmov.i8 d1, 0xee vmov.i8 d2, 0xdd In my problem, the number of double word…
windchime
  • 1,253
  • 16
  • 37
0
votes
1 answer

assembly asm code, how to load data from different source points?

i tried to improve some code, but it seems so difficult to me. i develop on Android NDK. the C++ code i want to improve followed: unsigned int test_add_C(unsigned int *x, unsigned int *y) { unsigned int result = 0; for (int i = 0; i < 8; i++) { …
joyDream
  • 27
  • 5
0
votes
2 answers

Error compiling Qt embedded pandaboard: [.moc/release-shared-emb-arm/moc_qabstractanimation.cpp] Error 1

I'm trying to compile Qt embedded for pandaboard (OMAP4, 4430). I installed this cross-compiler for armv7: sudo apt-get install g++-4.6-arm-linux-gnueabihf I downloaded last qt-embedded source and uncopressed them in the…
aldo85ita
  • 496
  • 2
  • 8
  • 19
0
votes
1 answer

Why Android crash when NEON SIMDization enabled? signal 11 (SIGSEGV), code 1 (SEGV_MAPERR)

I am in the process of doing some NEON based SIMDization to my code. It works perfectly fine with out SIMDization, but adding the following one line in the makefile causes it to crash, ifeq ($(TARGET_ARCH_ABI),armeabi-v7a) LOCAL_ARM_NEON :=…
Subhransu
  • 449
  • 1
  • 4
  • 12
0
votes
1 answer

Using ARM NEON instructions on a legacy assember

I have a Visual Studio 2008 C++03 project for Windows Mobile 6 where I would like to implement an ARM-NEON version of memcpy. The ARM Info Center kindly provides an implementation: ; NEON memory copy with preload NEONCopyPLD PLD [r1, #0xC0] …
PaulH
  • 7,759
  • 8
  • 66
  • 143
0
votes
1 answer

Learning GCC ASM: SSE to NEON: Loads and Stores

I have a section of inline ASM code written using SSE instructions that I need to port to NEON. Rather than just have the whole thing converted in bulk I want to learn the basics myself and see if I can do it step-by-step. So, the first step is…
Paul Braman
  • 173
  • 10