Questions tagged [intel]

For issues related to Intel semiconductor chips and assemblies, Intel architectural features and ISA extensions, and Intel chips micro-architecture.

Intel Corporation is an American multinational semiconductor chip maker corporation headquartered in Santa Clara, California, United States. Intel is the inventor of the x86 processor architecture and makes central processing units, motherboard chipsets, graphic processing units, network interface controllers and much more devices related to communications and computing.

In addition to their hardware offerings Intel also produces a variety of software including compilers, libraries for mathematical computation(Intel MKL), threading(OpenMP, Intel Performance Primatives, Threading Building Blocks), parallel communication(MPI,OFED/True Scale Infiniband Stack) and several other products included in the Intel Parallel Studio toolkit. In addition to these offerings which are widely used in HPC Intel also produces software for datacenter management and is one of the most prolific contributors to the Linux kernel.

This tag should be used for questions about Intel hardware and software.

The x86 and/or x86-64 tags are better choices for questions about assembly programming for the architecture, rather than things like performance tuning specifically for Intel's implementation of x86.


Useful links

Related tags

3529 questions
18
votes
2 answers

How does CLFLUSH work for an address that is not in cache yet?

We are trying to use the Intel CLFLUSH instruction to flush the cache content of a process in Linux at the userspace. We create a very simple C program that first access a large array and then call the CLFLUSH to flush the virtual address space of…
Mike
  • 1,841
  • 2
  • 18
  • 34
18
votes
2 answers

Enabling intel virtualization (VT-X) without option in BIOS

Sorry if the question is already answered, but I haven't found answer for my particular situation, that is a little different. I'm installing all the tools necessary for android programming. I have created an android virtual device, but the problem…
Carlos
  • 889
  • 3
  • 12
  • 34
18
votes
2 answers

How to transpose a 16x16 matrix using SIMD instructions?

I'm currently writing some code targeting Intel's forthcoming AVX-512 SIMD instructions, which supports 512-bit operations. Now assuming there's a matrix represented by 16 SIMD registers, each holding 16 32-bit integers (corresponds to a row), how…
lei_z
  • 1,049
  • 2
  • 13
  • 27
18
votes
1 answer

Strange JIT pessimization of a loop idiom

While analyzing the results of a recent question here, I encountered a quite peculiar phenomenon: apparently an extra layer of HotSpot's JIT-optimization actually slows down execution on my machine. Here is the code I have used for the…
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
18
votes
2 answers

Is the Intel Xeon Phi usable without a costly Intel Compiler?

Does the Intel Xeon Phi coprocessor, to be usable as parallel platform, require a license of the Intel Composer XE compiler, or are there alternative compilers?
clstaudt
  • 21,436
  • 45
  • 156
  • 239
17
votes
4 answers

Why is it not possible to push a byte onto a stack on Pentium IA-32?

I've come to learn that you cannot push a byte directly onto the Intel Pentium's stack, can anyone explain this to me please? The reason that I've been given is because the esp register is word-addressable (or, that is the assumption in our model)…
Tim Green
  • 2,028
  • 1
  • 17
  • 19
17
votes
2 answers

Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell

I am computing eight dot products at once with AVX. In my current code I do something like this (before unrolling): Ivy-Bridge/Sandy-Bridge __m256 areg0 = _mm256_set1_ps(a[m]); for(int i=0; i
Z boson
  • 32,619
  • 11
  • 123
  • 226
17
votes
1 answer

Why are Intel x87 registers 80 bits wide?

Why is such a “weird” register size used? Is there any documentation on why it is not preferable to use 64 or 128 bits for those registers?
JohnTortugo
  • 6,356
  • 7
  • 36
  • 69
17
votes
1 answer

C++/compilation : is it possible to set the size of the vptr (global vtable + 2 bytes index)

I posted recently a question about the memory overhead due to virtuality in C++. The answers allow me to understand how vtable and vptr works. My problem is the following : I work on supercomputers, I have billions of some objects and consequently I…
Vincent
  • 57,703
  • 61
  • 205
  • 388
17
votes
5 answers

Memory alignment on a 32-bit Intel processor

Intel's 32-bit processors such as Pentium have 64-bit wide data bus and therefore fetch 8 bytes per access. Based on this, I'm assuming that the physical addresses that these processors emit on the address bus are always multiples of 8. Firstly, is…
G S
  • 35,511
  • 22
  • 84
  • 118
16
votes
2 answers

_mm_load_ps vs. _mm_load_pd vs. etc on Intel x86 ISA

What's the difference between the following two lines? __m128 x = _mm_load_ps((float *) ptr); __m128 y = _mm_load_pd((double *)ptr); In other words, why are there so many different _mm_load_xyz instructions, instead of a generic __m128…
user541686
  • 205,094
  • 128
  • 528
  • 886
16
votes
2 answers

assembly registers esp and ebp

I am currently learning assembly for Intel processors. Since the stack 'grows down', why do we have to add in order to access a specific element [ebp + 8] ;; This will access the first param I konw we have to skip the old ebp value and the return…
Andrei
  • 159
  • 1
  • 1
  • 3
16
votes
1 answer

Why can't my ultraportable laptop CPU maintain peak performance in HPC

I have developed a high performance Cholesky factorization routine, which should have peak performance at around 10.5 GFLOPs on a single CPU (without hyperthreading). But there is some phenomenon which I don't understand when I test its performance.…
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
16
votes
1 answer

Is there a complete x86 assembly language reference that uses AT&T syntax?

Ideally there would be a version of Intel's Software Developer's Manuals written in AT&T syntax, but I would be happy to find anything that is close enough.
sigjuice
  • 28,661
  • 12
  • 68
  • 93
16
votes
5 answers

Will runtimes like CLR and JVM be able to use Haswell TSX instructions?

After reading Anandtech on 'Haswell TSX' (tranactional memory barriers) I immediately wondered if CLR/JVM will be able to make use of these in C#/Java/Scala/F# for heavily parallel applications (C# Rx/TPL/TFD).
yzorg
  • 4,224
  • 3
  • 39
  • 57