Questions tagged [intrinsics]

Intrinsics are functions used in compiled languages to trigger the execution specific processor instructions, typically those outside the scope of the compiled language itself.

Intrinsic functions are pseudo-functions used by compilers to represent functionality that is outside the current scope of the language; often times, they may later be incorporated into a language. Some examples are simd and atomic instructions. The compiler has knowledge of the operations of the intrinsics and is able to optimize register use to take advantage of them.

A compiler library usually has actual implementations of the functions, which are used if a lower class CPU (or completely different) is detected at run-time or compile time.

Compiler intrinsics are very similar to inline-assembly. Inline assembler has notations to denote permissible input and output registers as well as clobber values; unless the compiler implicitly parses the inline assembly. With a compiler intrinsic, the register use is already built into the compiler and a developer doesn't need to know as many low level details; although it is often helpful to have some low level assembler knowledge to guide profiling and optimization.

Related tags: simd atomic inline-assembly

1314 questions

votes

2 answers

Does Clang have something like #pragma GCC target?

I have some code written that uses AVX intrinsics when they are available on the current CPU. In GCC and Clang, unlike Visual C++, in order to use intrinsics, you must enable them on the command line. The problem with GCC and Clang is that when you…

clang intrinsics avx pragma

asked Sep 11 '17 at 23:37

Myria

3,372
1
24
42

votes

1 answer

Compile C++ code with AVX2/AVX512 intrinsics on AVX

I have production code that has kernels implemented for various SIMD instruction sets, including AVX, AVX2, and AVX512. The code can be compiled on the target machine for the target machine with something like ./configure --enable-proc=AVX…

c++ gcc cross-compiling intrinsics

asked Apr 25 '17 at 14:30

Martin Ueding

8,245
6
46
92

votes

1 answer

is there an inverse instruction to the movemask instruction in intel avx2?

The movemask instruction(s) take an __m256i and return an int32 where each bit (either the first 4, 8 or all 32 bits depending on the input vector element type) is the most significant bit of the corresponding vector element. I would like to do the…

x86 intrinsics avx avx2 icc

asked Apr 07 '16 at 23:01

orm

2,835
2
22
35

votes

1 answer

How does _mm_mwait work?

How does _mm_mwait from pmmintrin.h work? (I mean not the asm for it, but action and how this action is taken in NUMA systems. The store monitoring is easy to implement only on bus-based SMP systems with snooping of bus.) What processors does…

atomic intrinsics numa sse3

asked Apr 02 '10 at 02:23

osgx

90,338
53
357
513

votes

3 answers

Emulating shifts on 32 bytes with AVX

I am migrating vectorized code written using SSE2 intrinsics to AVX2 intrinsics. Much to my disappointment, I discover that the shift instructions _mm256_slli_si256 and _mm256_srli_si256 operate only on the two halves of the AVX registers separately…

c++ simd intrinsics sse2 avx2

asked Aug 11 '14 at 17:14

user1196549

votes

3 answers

Initializing an __m128 type from a 64-bit unsigned int

The _mm_set_epi64 and similar *_epi64 instructions seem to use and depend on __m64 types. I want to initialize a variable of type __m128 such that the upper 64 bits of it are 0, and the lower 64 bits of it are set to x, where x is of type uint64_t…

c++ sse intrinsics

asked May 05 '14 at 19:25

Gideon

votes

3 answers

Using SSE instructions with gcc without inline assembly

I am interested in using the SSE vector instructions of x86-64 with gcc and don't want to use any inline assembly for that. Is there a way I can do that in C? If so, can someone give me an example?

c x86-64 sse simd intrinsics

asked Apr 25 '12 at 06:37

pythonic

20,589
43
136
219

votes

1 answer

Fallback implementation for conflict detection in AVX2

AVX512CD contains the intrinsic _mm512_conflict_epi32(__m512i a) it returns a vector where for every element in a a bit is set if it has the same value. Is there a way to do something similar in AVX2? I'm not interested in the extact bits I just…

c++ x86 intrinsics avx2 avx512

asked Jun 30 '17 at 09:47

Christoph Diegelmann

2,004
15
26

votes

4 answers

Most efficient way to store 4 dot products into a contiguous array in C using SSE intrinsics

I am optimizing some code for an Intel x86 Nehalem micro-architecture using SSE intrinsics. A portion of my program computes 4 dot products and adds each result to the previous values in a contiguous chunk of an array. More specifically, tmp0 =…

c sse simd intrinsics dot-product

asked Nov 13 '10 at 06:08

Sam

votes

1 answer

How to merge a scalar into a vector without the compiler wasting an instruction zeroing upper elements? Design limitation in Intel's intrinsics?

I don't have a particular use-case in mind; I'm asking if this is really a design flaw / limitation in Intel's intrinsics or if I'm just missing something. If you want to combine a scalar float with an existing vector, there doesn't seem to be a way…

c gcc x86 sse intrinsics

asked Sep 04 '16 at 15:24

Peter Cordes

328,167
45
605
847

votes

4 answers

How do I reorder vector data using ARM Neon intrinsics?

This is specifically related to ARM Neon SIMD coding. I am using ARM Neon instrinsics for certain module in a video decoder. I have a vectorized data as follows: There are four 32 bit elements in a Neon register - say, Q0 - which is of size 128 bit.…

arm simd neon intrinsics

asked Apr 11 '10 at 07:02

goldenmean

18,376
54
154
211

votes

2 answers

Fast calculate hamming distance in C

I read the Wikipedia article on Hamming Weight and noticed something interesting: It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in…

c gcc intrinsics hamming-distance

asked Aug 02 '14 at 20:13

haneefmubarak

1,911
1
21
32

votes

1 answer

Vectorizing Modular Arithmetic

I'm trying to write some reasonably fast component-wise vector addition code. I'm working with (signed, I believe) 64-bit integers. The function is void addRq (int64_t* a, const int64_t* b, const int32_t dim, const int64_t q) { for(int i = 0; i…

c assembly x86-64 sse intrinsics

asked Dec 16 '13 at 06:35

crockeea

21,651
10
48
101

votes

1 answer

How to load a pixel struct into an SSE register?

I have a struct of 8-bit pixel data: struct __attribute__((aligned(4))) pixels { char r; char g; char b; char a; } I want to use SSE instructions to calculate certain things on these pixels (namely, a Paeth transformation). How can…

c pixel x86-64 sse intrinsics

asked Aug 25 '12 at 11:44

fuz

88,405
25
200
352

votes

1 answer

What's the difference between __popcnt() and _mm_popcnt_u32()?

MS Visual C++ supports 2 flavors of the popcnt instruction on CPUs with SSE4.2: __popcnt() _mm_popcnt_u32() The only difference I found was that the docs for __popcnt() are marked as "Microsoft Specific", and _mm_popcnt_u32() seems to be an…

x86 sse intrinsics sse4

asked Jun 20 '12 at 06:32

Adi Shavit

16,743
5
67
137

Prev 1 2 3

…

87 88 Next