Questions tagged [intel]

For issues related to Intel semiconductor chips and assemblies, Intel architectural features and ISA extensions, and Intel chips micro-architecture.

Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States. Intel is the inventor of the x86 processor architecture and makes central processing units, motherboard chipsets, graphic processing units, network interface controllers and much more devices related to communications and computing.

In addition to their hardware offerings Intel also produces a variety of software including compilers, libraries for mathematical computation(Intel MKL), threading(OpenMP, Intel Performance Primatives, Threading Building Blocks), parallel communication(MPI,OFED/True Scale Infiniband Stack) and several other products included in the Intel Parallel Studio toolkit. In addition to these offerings which are widely used in HPC Intel also produces software for datacenter management and is one of the most prolific contributors to the Linux kernel.

This tag should be used for questions about Intel hardware and software.

The x86 and/or x86-64 tags are better choices for questions about assembly programming for the architecture, rather than things like performance tuning specifically for Intel's implementation of x86.


Useful links

Related tags

3529 questions
78
votes
10 answers

Is using double faster than float?

Double values store higher precision and are double the size of a float, but are Intel CPUs optimized for floats? That is, are double operations just as fast or faster than float operations for +, -, *, and /? Does the answer change for 64-bit…
Brent Faust
  • 9,103
  • 6
  • 53
  • 57
77
votes
3 answers

How to generate assembly code with clang in Intel syntax?

As this question shows, with g++, I can do g++ -S -masm=intel test.cpp. Also, with clang, I can do clang++ -S test.cpp, but -masm=intel is not supported by clang (warning argument unused during compilation: -masm=intel). How do I get intel syntax…
Jesse Good
  • 50,901
  • 14
  • 124
  • 166
75
votes
2 answers

What is the purpose of the "PAUSE" instruction in x86?

I am trying to create a dumb version of a spin lock. Browsing the web, I came across a assembly instruction called "PAUSE" in x86 which is used to give hint to a processor that a spin-lock is currently running on this CPU. The intel manual and other…
prathmesh.kallurkar
  • 5,468
  • 8
  • 39
  • 50
73
votes
6 answers

How can I distinguish between high- and low-performance cores/threads in C++?

When talking about multi-threading, it often seems like threads are treated as equal - just the same as the main thread, but running next to it. On some new processors, however, such as the Apple "M" series and the upcoming Intel Alder Lake series…
janekb04
  • 4,304
  • 2
  • 20
  • 51
65
votes
3 answers

Why is x86 little endian?

A real question that I've been asking myself lately is what design choices brought about x86 being a little endian architecture instead of a big endian architecture?
bfrog
  • 911
  • 1
  • 9
  • 6
64
votes
2 answers

How to check if Intel Virtualization is enabled without going to BIOS in Windows 10

I want to check if Intel virtualization is enabled in my laptop or not (Lenovo Thinkpad, Win 10 64 bit). Is there any way available to check it without going to BIOS?
Amol.Shaligram
  • 713
  • 1
  • 5
  • 12
63
votes
2 answers

Why is this SSE code 6 times slower without VZEROUPPER on Skylake?

I've been trying to figure out a performance problem in an application and have finally narrowed it down to a really weird problem. The following piece of code runs 6 times slower on a Skylake CPU (i5-6500) if the VZEROUPPER instruction is commented…
Olivier
  • 1,144
  • 1
  • 8
  • 15
62
votes
9 answers

How to control which core a process runs on?

I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of communication mechanisms. I also understand context switching. That is, with…
poundifdef
  • 18,726
  • 23
  • 95
  • 134
62
votes
4 answers

Micro fusion and addressing modes

I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA). The following instruction using [base+index] addressing addps xmm1, xmmword ptr [rsi+rax*1] does not micro-fuse according to IACA. However, if I use…
Z boson
  • 32,619
  • 11
  • 123
  • 226
60
votes
2 answers

FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2

I'm confused on how many flops per cycle per core can be done with Sandy-Bridge and Haswell. As I understand it with SSE it should be 4 flops per cycle per core for SSE and 8 flops per cycle per core for AVX/AVX2. This seems to be verified…
user2088790
57
votes
3 answers

How are cache memories shared in multicore Intel CPUs?

I have a few questions regarding Cache memories used in Multicore CPUs or Multiprocessor systems. (Although not directly related to programming, it has many repercussions while one writes software for multicore processors/multiprocessors systems,…
goldenmean
  • 18,376
  • 54
  • 154
  • 211
57
votes
3 answers

Why is Numpy with Ryzen Threadripper so much slower than Xeon?

I know that Numpy can use different backends like OpenBLAS or MKL. I have also read that MKL is heavily optimized for Intel, so usually people suggest to use OpenBLAS on AMD, right? I use the following test code: import numpy as np def…
theV0ID
  • 4,172
  • 9
  • 35
  • 56
57
votes
1 answer

How are denormalized floats handled in C#?

Just read this fascinating article about the 20x-200x slowdowns you can get on Intel CPUs with denormalized floats (floating point numbers very close to 0). There is an option with SSE to round these off to 0, restoring performance when such…
Robin Rodricks
  • 110,798
  • 141
  • 398
  • 607
55
votes
2 answers

SIMD instructions lowering CPU frequency

I read this article. It talked about why AVX-512 instruction: Intel’s latest processors have advanced instructions (AVX-512) that may cause the core, or maybe the rest of the CPU to run slower because of how much power they use. I think on…
HCSF
  • 2,387
  • 1
  • 14
  • 40
52
votes
7 answers

Where is the L1 memory cache of Intel x86 processors documented?

I am trying to profile and optimize algorithms and I would like to understand the specific impact of the caches on various processors. For recent Intel x86 processors (e.g. Q9300), it is very hard to find detailed information about cache structure.…
Brent Bradburn
  • 51,587
  • 17
  • 154
  • 173