Questions tagged [intrinsics]

Intrinsics are functions used in compiled languages to trigger the execution specific processor instructions, typically those outside the scope of the compiled language itself.

Intrinsic functions are pseudo-functions used by compilers to represent functionality that is outside the current scope of the language; often times, they may later be incorporated into a language. Some examples are simd and atomic instructions. The compiler has knowledge of the operations of the intrinsics and is able to optimize register use to take advantage of them.

A compiler library usually has actual implementations of the functions, which are used if a lower class CPU (or completely different) is detected at run-time or compile time.

Compiler intrinsics are very similar to inline-assembly. Inline assembler has notations to denote permissible input and output registers as well as clobber values; unless the compiler implicitly parses the inline assembly. With a compiler intrinsic, the register use is already built into the compiler and a developer doesn't need to know as many low level details; although it is often helpful to have some low level assembler knowledge to guide profiling and optimization.

Related tags: simd atomic inline-assembly

1314 questions

votes

2 answers

Why and when to use __noop?

I was reading about __noop and the MSDN example is #if DEBUG #define PRINT printf_s #else #define PRINT __noop #endif int main() { PRINT("\nhello\n"); } and I don't see the gain over just having an empty macro: #define PRINT The…

c++ visual-c++ intrinsics

asked Jan 22 '13 at 12:57

Luchian Grigore

253,575
64
457
625

votes

1 answer

Undocumented intrinsic routines

Delphi has this list: Delphi Intrinsic Routines But that list is incomplete. What are the 7 undocumented intrinsic functions, since when and what is their purpose?

delphi documentation intrinsics

asked May 23 '15 at 20:23

Johan

74,508
24
191
319

votes

3 answers

What's the difference between logical SSE intrinsics?

Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_si128 all of which do the same thing: compute bitwise OR of their operands.…

c sse simd intrinsics sse2

asked May 10 '10 at 17:32

user283145

votes

2 answers

How to sum __m256 horizontally?

I would like to horizontally sum the components of a __m256 vector using AVX instructions. In SSE I could use _mm_hadd_ps(xmm,xmm); _mm_hadd_ps(xmm,xmm); to get the result at the first component of the vector, but this does not scale with the 256…

sse vectorization intrinsics avx

asked Nov 04 '12 at 13:55

Yoav

5,962
5
39
61

votes

6 answers

How to use MSVC intrinsics to get the equivalent of this GCC code?

The following code calls the builtin functions for clz/ctz in GCC and, on other systems, has C versions. Obviously, the C versions are a bit suboptimal if the system has a builtin clz/ctz instruction, like x86 and ARM. #ifdef __GNUC__ #define…

c visual-c++ intrinsics

asked Dec 10 '08 at 13:00

Dark Shikari

7,941
4
26
38

votes

1 answer

How to implement "_mm_storeu_epi64" without aliasing problems?

(Note: Although this question is about "store", the "load" case has the same issues and is perfectly symmetric.) The SSE intrinsics provide an _mm_storeu_pd function with the following signature: void _mm_storeu_pd (double *p, __m128d a); So if I…

c++ sse intrinsics strict-aliasing

asked Jul 16 '14 at 17:39

Nemo

70,042
10
116
153

votes

2 answers

How to rotate an SSE/AVX vector

I need to perform a rotate operation with as little clock cycles as possible. In the first case let's assume __m128i as source and dest type: source: || A0 || A1 || A2 || A3 || dest: || A1 || A2 || A3 || A0 || dest =…

c x86 sse intrinsics avx

asked Aug 10 '12 at 17:52

user1584773

votes

2 answers

Reference manual/tutorial for x86 SIMD intrinsics?

I'm looking into using these to improve the performance of some code but good documentation seems hard to find for the functions defined in the *mmintrin.h headers, can anybody provide me with pointers to good info on these? EDIT: particularly…

simd sse intrinsics avx

asked Jul 28 '11 at 11:03

BD at Rivenhill

12,395
10
46
49

votes

3 answers

Why does does SSE set (_mm_set_ps) reverse the order of arguments

I recently noticed that _m128 m = _mm_set_ps(0,1,2,3); puts the 4 floats into reverse order when cast to a float array: (float*) p = (float*)(&m); // p[0] == 3 // p[1] == 2 // p[2] == 1 // p[3] == 0 The same happens with a union { _m128 m;…

c++ c simd sse intrinsics

asked Mar 08 '11 at 20:30

Inverse

4,408
2
26
35

votes

1 answer

Do I get a performance penalty when mixing SSE integer/float SIMD instructions

I've used x86 SIMD instructions (SSE1234) in the form of intrinsics quite a lot lately. What I found frustrating is that the SSE ISA has several simple instructions that are available only for floats or only for integers, but in theory should…

c assembly sse simd intrinsics

asked Feb 14 '11 at 19:28

user283145

votes

5 answers

Intrinsics for CPUID like informations?

Considering that I'm coding in C++, if possible, I would like to use an Intrinsics-like solution to read useful informations about the hardware, my concerns/considerations are: I don't know assembly that well, it will be a considerable investment…

c++ intrinsics cpuid

asked Jul 20 '13 at 03:40

user2485710

9,451
13
58
102

votes

5 answers

Is it possible to cast floats directly to __m128 if they are 16 byte aligned?

Is it safe/possible/advisable to cast floats directly to __m128 if they are 16 byte aligned? I noticed using _mm_load_ps and _mm_store_ps to "wrap" a raw array adds a significant overhead. What are potential pitfalls I should be aware of? EDIT…

c++ c alignment sse intrinsics

asked Aug 01 '12 at 12:57

dtech

47,916
17
112
190

votes

1 answer

Divide by floating-point number using NEON intrinsics

I'm processing an image by four pixels at the time, this on a armv7 for an Android application. I want to divide a float32x4_t vector by another vector but the numbers in it are varying from circa 0.7 to 3.85, and it seems to me that the only way to…

android c arm intrinsics neon

asked Jul 20 '11 at 09:41

Darkmax

votes

1 answer

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

I had been using _mm256_lddqu_si256 based on an example I found online. Later I discovered _mm256_loadu_si256. The Intel Intrinsics guide only states that the lddqu version may perform better when crossing a cache line boundary. What might be the…

x86 simd intrinsics avx micro-optimization

asked Nov 22 '17 at 02:26

Jimbo

2,886
2
29
45

votes

0 answers

Costs of new AVX512 instruction - Scatter store

I'm playing around with the new AVX512 instruction sets and I try to understand how they work and how one can use them. What I try is to interleave specific data, selected by a mask. My little benchmark loads x*32 byte of aligned data from memory…

performance x86 intrinsics avx512

asked Sep 04 '17 at 18:23

Hymir

Prev 1

…

87 88 Next