Questions tagged [xeon-phi]

a co-processor/accelerator from Intel

Intel Many Integrated Core Architecture or Intel MIC (pronounced Mike) is a multiprocessor computer architecture developed by Intel incorporating earlier work on the Larrabee many core architecture, the Teraflops Research Chip multicore chip research project, and the Intel Single-chip Cloud Computer multicore microprocessor.

188 questions
3
votes
2 answers

Reading files to shared memory

I am reading a binary file that I want to offload directly to the Xeon Phi through Cilk and shared memory. As we are reading fairly much data at once each time and binary data the preferred option is to use fread. So if I make a very simple example…
Asthor
  • 598
  • 4
  • 17
3
votes
1 answer

Cannot execute binary error on an Intel Xeon Phi

I am having a C code that compiles and runs properly locally on my machine. But when I am trying to compile with the icc and the -mmic flag and test it on Intel Xeon Phi, I am getting the following…
mikepapadim
  • 443
  • 2
  • 14
3
votes
1 answer

Atomic test-and-set in x86: inline asm or compiler-generated lock bts?

The below code when compiled for a xeon phi throws Error: cmovc is not supported on k1om. But it does compile properly for a regular xeon processor. #include int main() { int in=5; int bit=1; int x=0, y=1; int& inRef = in; …
arunmoezhi
  • 3,082
  • 6
  • 35
  • 54
3
votes
2 answers

inline assembly of reduce operation for Xeon Phi

I am looking for inline assembly operation for add reduce operation for Xeon Phi. I found _mm512_reduce_add_epi32 intrinsic on intel intrinsic website (link). However in the website, they did not mentioned the actual assembly operation for it. Can…
Hamid_UMB
  • 317
  • 4
  • 16
3
votes
2 answers

operand type mismatch for `vpbroadcastd'

I tried to find a KNC broadcast instruction for Xeon Phi platform. But I could not find any instruction. Instead I tried to use this AVX _mm512_set1_epi32 intrinsic in assembly. I have two questions: first is there any KNC broadcast instruction?…
Hamid_UMB
  • 317
  • 4
  • 16
3
votes
0 answers

Vtune get summary information only

I use Intel Vtune to profile a code on Xeon Phi. I use the following command: amplxe-cl -collect knc-general-exploration ./a.out The result is a bunch of information along with a new directory containing more information. I'm just interested in a…
arunmoezhi
  • 3,082
  • 6
  • 35
  • 54
3
votes
2 answers

Intel xeon phi programming with gcc

I kind of want to get the intel xeon phi co-processor since there is a model which seems to be running for $230. I have two questions. Can I fully utilize the capabilities of this just using gcc along with openmp or will I need the intel compiler.…
chasep255
  • 11,745
  • 8
  • 58
  • 115
3
votes
1 answer

Running Erlang on Xeon Phi

How can I compile the VM and run Erlang programs on the Intel Xeon Phi coprocessor?
stpk
  • 2,015
  • 1
  • 16
  • 23
3
votes
1 answer

Intel TBB and Cilk Plus thread affinity on Intel MIC

I would like to write parallel code for Intel Xeon Phi using Intel TBB and Cilk Plus but i have a problem with thread affinity. I want to bind one thread to one logical core. Is is possible to set affinity like in OpenMP? I mean…
JudgeDeath
  • 151
  • 1
  • 2
  • 9
3
votes
0 answers

Will _mm512_mask_prefetch_i32gather_ps() prefetch an entire cache line for each element?

The gather prefetch intrinsic _mm512_mask_prefetch_i32gather_ps can be used to prefetch 32 bit floats on Knights Corner. Since a corresponding intrinsic for doubles does not exist, how should this intrinsic be used for prefetch 64 or 128 bit…
amckinley
  • 629
  • 1
  • 7
  • 15
3
votes
3 answers

OpenCL on Xeon Phi: 2D Convolution Experience - OpenCL vs OpenMP

The performance of Xeon Phi benchmarked with 2D convolution in opnecl seems much better than an openmp implementation even with compiler-enabled vectorization. Openmp version was run in phi native mode, and timing measured only computation part:…
nikk
  • 2,627
  • 5
  • 30
  • 51
3
votes
1 answer

loaddup_pd/unpacklo_pd on Xeon Phi

If I have the following doubles in a 512-wide SIMD vector, as in a Xeon Phi register: m0 = |b4|a4|b3|a3|b2|a2|b1|a1| is it possible to make it into: m0_d = |a4|a4|a3|a3|a2|a2|a1|a1| using a single instruction? Also since there are no bitwise…
user1715122
  • 947
  • 1
  • 11
  • 26
2
votes
1 answer

Possibility to use Python 3.6 with Intel MKL 2017 and a Xeon Phi KNC Card

I am experimenting with an Intel Xeon Phi 3120A card and automatic offloading using Python. I got it running using Intel Python 2017 with the help of this post. By that I found out that the card unfortunately is only supported by the 2017 version of…
mapf
  • 496
  • 1
  • 5
  • 21
2
votes
1 answer

overriding function calls from SVML

The Xeon-Phi Knights Landing cores have a fast exp2 instruction vexp2pd (intrinsic _mm512_exp2a23_pd). The Intel C++ compiler can vectorize the exp function using the Short Vector Math Library (SVML) which comes with the compiler. Specifically, it…
Z boson
  • 32,619
  • 11
  • 123
  • 226
2
votes
0 answers

Compiler Optimization flags for ffmpeg on Intel Xeon Phi

I am trying to compile ffmpeg on Xeon Phi processor. Are there any specific compiler optimization flags specific for Xeon Phi that I can enable while configuring ffmpeg so that I can achieve better performance in terms of encoder frames per…
Ambujam
  • 21
  • 3
1 2
3
12 13