Questions tagged [intrinsics]

Intrinsics are functions used in compiled languages to trigger the execution specific processor instructions, typically those outside the scope of the compiled language itself.

Intrinsic functions are pseudo-functions used by compilers to represent functionality that is outside the current scope of the language; often times, they may later be incorporated into a language. Some examples are simd and atomic instructions. The compiler has knowledge of the operations of the intrinsics and is able to optimize register use to take advantage of them.

A compiler library usually has actual implementations of the functions, which are used if a lower class CPU (or completely different) is detected at run-time or compile time.

Compiler intrinsics are very similar to inline-assembly. Inline assembler has notations to denote permissible input and output registers as well as clobber values; unless the compiler implicitly parses the inline assembly. With a compiler intrinsic, the register use is already built into the compiler and a developer doesn't need to know as many low level details; although it is often helpful to have some low level assembler knowledge to guide profiling and optimization.

Related tags:

1314 questions
16
votes
1 answer

When the compiler reorders AVX instructions on Sandy, does it affect performance?

Please do not say this is premature microoptimization. I want to understand, as much as it is possible given my limited knowledge, how the described SB feature and assembly works, and make sure that my code makes use of this architectural feature.…
iksemyonov
  • 4,106
  • 1
  • 22
  • 42
16
votes
1 answer

Funnel shift - what is it?

When reading through CUDA 5.0 Programming Guide I stumbled on a feature called "Funnel shift" which is present in 3.5 compute-capable device, but not 3.0. It contains an annotation "see reference manual", but when I search for the "funnel shift"…
CygnusX1
  • 20,968
  • 5
  • 65
  • 109
15
votes
3 answers

What is the difference between Java intrinsic and native methods?

Java intrinsic functions are mentioned in various places (e.g. here). My understanding is that these are methods that handled with special native code. This seems similar to a JNI method which is also a block of native code. What is the difference?
rghome
  • 8,529
  • 8
  • 43
  • 62
15
votes
3 answers

How to use the multiply and accumulate intrinsics in ARM Cortex-a8?

how to use the Multiply-Accumulate intrinsics provided by GCC? float32x4_t vmlaq_f32 (float32x4_t , float32x4_t , float32x4_t); Can anyone explain what three parameters I have to pass to this function. I mean the Source and destination registers…
HaggarTheHorrible
  • 7,083
  • 20
  • 70
  • 81
15
votes
4 answers

What's the proper way to use different versions of SSE intrinsics in GCC?

I will ask my question by giving an example. Now I have a function called do_something(). It has three versions: do_something(), do_something_sse3(), and do_something_sse4(). When my program runs, it will detect the CPU feature (see if it supports…
shengbinmeng
  • 1,517
  • 2
  • 12
  • 22
15
votes
2 answers

Scatter intrinsics in AVX

I can't find them in the Intel Intrinsic Guide v2.7. Do you know if AVX or AVX2 instruction sets support them?
elmattic
  • 12,046
  • 5
  • 43
  • 79
14
votes
2 answers

Constexpr and SSE intrinsics

Most C++ compilers support SIMD(SSE/AVX) instructions with intrisics like _mm_cmpeq_epi32 My problem with this is that this function is not marked as constexpr, although "semantically" there is no reason for this function to not be constexpr since…
NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
14
votes
1 answer

Intel Intrinsics guide - Latency and Throughput

Can somebody explain the Latency and the Throughput values given in the Intel Intrinsic Guide? Have I understood it correctly that the latency is the amount of time units an instruction takes to run, and the throughput is the number of instructions…
Philipp Neufeld
  • 1,053
  • 10
  • 23
13
votes
2 answers

How do you use the pause assembly instruction in 64-bit C++ code?

Since inlined assembly is not supported by VC++ 2010 in 64-bit code, how do I get a pause x86-64 instruction into my code? There does not appear to be an intrinsic for this like there is for many other common assembly instructions (e.g., __rdtsc(),…
Michael Goldshteyn
  • 71,784
  • 24
  • 131
  • 181
13
votes
1 answer

Why are there 128bit load functions for SSE?

I'm poking around in somebody else's code and currently trying to figure out why _mm_load_si128 exists. Essentially, I tried replacing _ra = _mm_load_si128(reinterpret_cast<__m128i*>(&cd->data[idx])); with _ra =…
user81993
  • 6,167
  • 6
  • 32
  • 64
13
votes
3 answers

Produce loops without cmp instruction in GCC

I have a number of tight loops I'm trying to optimize with GCC and intrinsics. Consider for example the following function. void triad(float *x, float *y, float *z, const int n) { float k = 3.14159f; int i; __m256 k4 =…
Z boson
  • 32,619
  • 11
  • 123
  • 226
12
votes
4 answers

Arm Neon Intrinsics vs hand assembly

https://web.archive.org/web/20170227190422/http://hilbert-space.de/?p=22 On this site which is quite dated it shows that hand written asm would give a much greater improvement then the intrinsics. I am wondering if this is the current truth even now…
George Host
  • 980
  • 1
  • 12
  • 26
12
votes
3 answers

SSE instruction set not enabled

I am getting trouble with this error: "SSE instruction set not enabled". How I can figure this out? I have ACER i7, Ubuntu 11.10, please any one can help me? Any help will be appreciated! Also running: sudo cat /proc/cpuinfo | grep…
ksolid
  • 151
  • 1
  • 2
  • 5
12
votes
5 answers

128-bit division intrinsic in Visual C++

I'm wondering if there really is no 128-bit division intrinsic function in Visual C++? There is a 64x64=128 bit multiplication intrinsic function called _umul128(), which nicely matches the MUL x64 assembler instruction. Naturally, I assumed there…
cxxl
  • 4,939
  • 3
  • 31
  • 52
12
votes
1 answer

gcc, simd intrinsics and fast-math concepts

Hi all :) I'm trying to get a hang on a few concepts regarding floating point, SIMD/math intrinsics and the fast-math flag for gcc. More specifically, I'm using MinGW with gcc v4.5.0 on a x86 cpu. I've searched around for a while now, and that's…
rocket441
  • 277
  • 3
  • 7
1 2
3
87 88