Questions tagged [intel]

For issues related to Intel semiconductor chips and assemblies, Intel architectural features and ISA extensions, and Intel chips micro-architecture.

Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States. Intel is the inventor of the x86 processor architecture and makes central processing units, motherboard chipsets, graphic processing units, network interface controllers and much more devices related to communications and computing.

In addition to their hardware offerings Intel also produces a variety of software including compilers, libraries for mathematical computation(Intel MKL), threading(OpenMP, Intel Performance Primatives, Threading Building Blocks), parallel communication(MPI,OFED/True Scale Infiniband Stack) and several other products included in the Intel Parallel Studio toolkit. In addition to these offerings which are widely used in HPC Intel also produces software for datacenter management and is one of the most prolific contributors to the Linux kernel.

This tag should be used for questions about Intel hardware and software.

The x86 and/or x86-64 tags are better choices for questions about assembly programming for the architecture, rather than things like performance tuning specifically for Intel's implementation of x86.


Useful links

Related tags

3529 questions
36
votes
8 answers

Why is floor() so slow?

I wrote some code recently (ISO/ANSI C), and was surprised at the poor performance it achieved. Long story short, it turned out that the culprit was the floor() function. Not only it was slow, but it did not vectorize (with Intel compiler, aka…
Roger
36
votes
2 answers

Significant FMA performance anomaly experienced in the Intel Broadwell processor

Code1: vzeroall mov rcx, 1000000 startLabel1: vfmadd231ps ymm0, ymm0, ymm0 vfmadd231ps ymm1, ymm1, ymm1 vfmadd231ps ymm2, ymm2, ymm2 vfmadd231ps ymm3, ymm3, ymm3 vfmadd231ps ymm4, ymm4, ymm4 vfmadd231ps ymm5,…
User9973
  • 559
  • 5
  • 8
35
votes
6 answers

What is the purpose of CS and IP registers in Intel 8086 assembly?

So, as the question states, what is the purpose of CS and IP registers in intel's 8086 I found this explanation: Code segment (CS) is a 16-bit register containing address of 64 KB segment with processor instructions. The processor uses CS segment…
idjuradj
  • 1,355
  • 6
  • 19
  • 31
35
votes
3 answers

How to read the Intel Opcode notation

I am reading some material which quotes Intel's vol.2 SDM x86 manual about opcodes and machine-code encoding of assembly instructions, but I cannot understand what things like cw, cd, /2, cp, or /3 mean following the opcode byte. E8 cw CALL rel16…
asher
  • 353
  • 1
  • 3
  • 4
33
votes
5 answers

Can one construct a "good" hash function using CRC32C as a base?

Given that SSE 4.2 (Intel Core i7 & i5 parts) includes a CRC32 instruction, it seems reasonable to investigate whether one could build a faster general-purpose hash function. According to this only 16 bits of a CRC32 are evenly distributed. So what…
DavidD
  • 361
  • 1
  • 4
  • 5
32
votes
4 answers

What is the latency and throughput of the RDRAND instruction on Ivy Bridge?

I cannot find any info on agner.org on the latency or throughput of the RDRAND instruction. However, this processor exists, so the information must be out there. Edit: Actually the newest optimization manual mentions this instruction. It is…
user239558
  • 6,964
  • 1
  • 28
  • 35
31
votes
2 answers

Why 64 bit mode ( Long mode ) doesn't use segment registers?

I'm a beginner level of student :) I'm studying about intel architecture, and I'm studying a memory management such as a segmentation and paging. I'm reading Intel's manual and it's pretty nice to understand intel's architectures. However I'm still…
Henrik
  • 421
  • 1
  • 4
  • 12
30
votes
3 answers

Is double read atomic on an Intel architecture?

My colleague and I are having an argument on atomicity of reading a double on an Intel architecture using C# .NET 4.0. He is arguing that we should use Interlocked.Exchange method for writing into a double, but just reading the double value (in some…
Alok
  • 3,160
  • 3
  • 28
  • 47
29
votes
7 answers

How much should I worry about the Intel C++ compiler emitting suboptimal code for AMD?

We've always been an Intel shop. All the developers use Intel machines, recommended platform for end users is Intel, and if end users want to run on AMD it's their lookout. Maybe the test department had an AMD machine somewhere to check we didn't…
timday
  • 24,582
  • 12
  • 83
  • 135
28
votes
4 answers

Branch alignment for loops involving micro-coded instructions on Intel SnB-family CPUs

This is related, but not the same, as this question: Performance optimisations of x86-64 assembly - Alignment and branch prediction and is slightly related to my previous question: Unsigned 64-bit to double conversion: why this algorithm from…
Matthew Daws
  • 1,837
  • 1
  • 17
  • 26
27
votes
2 answers

How do Intel Xeon CPUs write to memory?

I'm trying to decide between two algorithms. One writes 8 bytes (two aligned 4-byte words) to 2 cache lines, the other writes 3 entire cache lines. If the CPU writes only the changed 8 bytes back to memory, then the first algorithm uses much less…
Eloff
  • 20,828
  • 17
  • 83
  • 112
27
votes
1 answer

How many instructions are there on x86 today?

I am trying to learn up to date x86 assembly all from old 386 base instructions through all the sse additions up until now. I read some things like SSE5 counts 170 new instructions - and I became urged to know how many of them there are presently…
user2214913
  • 1,441
  • 2
  • 19
  • 29
26
votes
2 answers

Reason for collapse of memory bandwidth when 2KB of data is cached in L1-cache

In a self-educational project I measure the bandwidth of the memory with help of the following code (here paraphrased, the whole code follows at the end of the question): unsigned int doit(const std::vector &mem){ const size_t…
ead
  • 32,758
  • 6
  • 90
  • 153
26
votes
2 answers

How are the gather instructions in AVX2 implemented?

Suppose I'm using AVX2's VGATHERDPS - this should load 8 single-precision floats using 8 DWORD indices. What happens when the data to be loaded exists in different cache-lines? Is the instruction implemented as a hardware loop which fetches…
Anuj Kalia
  • 803
  • 8
  • 16
26
votes
3 answers

Strange BufferStrategy issue - Game runs fast only on Intel GPUs

I ran into a very strange problem, I tried searching for an answer for days and days. My game just got a new particle system, but was too slow to be playable. Unfortunately, BufferedImage transformations are very slow. The explosion effect consists…
Simon Tamás
  • 281
  • 3
  • 8