Questions tagged [xeon-phi]

a co-processor/accelerator from Intel

Intel Many Integrated Core Architecture or Intel MIC (pronounced Mike) is a multiprocessor computer architecture developed by Intel incorporating earlier work on the Larrabee many core architecture, the Teraflops Research Chip multicore chip research project, and the Intel Single-chip Cloud Computer multicore microprocessor.

188 questions
2
votes
0 answers

Why my Xeon E5-2640 v3 2.60GHz runs twice faster than my Xeon E5-4627 v3 2.6GHz

I am struggling with the performance of the newly bought machine with the Xeon E5-4627 CPU. My experiment codes ran almost twice as slow on Xeon E5-4627 v3 CPU than the same code ran on Xeon E5-2640 v3 processor. I did some simple benchmark on…
Fionser
  • 161
  • 2
  • 8
2
votes
1 answer

Xeon Phi: slower performance with padding

I have implemented a simple n x n matrix multiplication to test same performance tunings in c with OpenMp. My initial code is the following: #pragma omp parallel for shared(a,b,c) private(h,i,j,k) for( i = 0; i < n; i++ ) { for( j =…
ImmaCute
  • 57
  • 7
2
votes
1 answer

vectorization and parallelization Xeon Phi

I am looking for an simple example where using vectorization and parallelization on Xeon Phi this has better perfomance than only-Xeon. Could you help me please? I am trying with the next example. I comment the lines 14, 18 and 19 for run on…
Juan
  • 2,073
  • 3
  • 22
  • 37
2
votes
2 answers

Vector Sum using AVX Inline Assembly on XeonPhi

I am new to use XeonPhi Intel co-processor. I want to write code for a simple Vector sum using AVX 512 bit instructions. I use k1om-mpss-linux-gcc as a compiler and want to write inline assembly. Here it is my code: #include #include…
Hamid_UMB
  • 317
  • 4
  • 16
2
votes
3 answers

Padding array manually

I am trying to understand 9 point stencil's algorithm from this book , the logic is clear to me , but the calculation of WIDTHP macro is what i am unable to understand, here is the breif code (original code is more than 300 lines length!!): #define…
puneet336
  • 433
  • 5
  • 20
2
votes
1 answer

Running Python on Xeon Phi

I would like to port a semi-HPC code scriptable with Python to Xeon Phi, to try out the performance increase; it cannot be run in offload mode (data transfers would be prohibitive), the whole code must be run on the co-processor. Can someone…
eudoxos
  • 18,545
  • 10
  • 61
  • 110
2
votes
1 answer

Does Intel Xeon Phi co-processor support graphic processing on hardware level?

I am going to do some rendering experiments on a large scale computer system with massive number of processors. This system uses some Intel Xeon E5 processors and Intel Xeon Phi co-processors. I've read documents and developer guide of Xeon Phi…
cxcfan
  • 185
  • 9
2
votes
1 answer

Xeon Phi Knights Corner intrinsics with GCC

I'm thinking of purchasing a Xeon Phi Knights Corner (KNC) coprocessor card. But I don't own an Intel Compiler and I have no interest in purchasing one (and the non-commercial version no longer seems to be an option). It appears that GCC is getting…
Z boson
  • 32,619
  • 11
  • 123
  • 226
2
votes
1 answer

Allocating Multiple Threads to Single Parallel Do on Xeon Phi with Open MP

I have some code similar to this: !$dir parallel do do k = 1, NUM_JOBS call asynchronous_task( parameter_array(k) ) end do !$dir end parallel do I've tried many different strategies, including $ micnativeloadex $exe -e "KMP_PLACE_THREADS=59Cx4T…
jyalim
  • 3,289
  • 1
  • 15
  • 22
2
votes
3 answers

Performance degradation if loop count is not known at compile time on Xeon Phi

I am creating a simple matrix multiplication procedure, operating on the Intel Xeon Phi architecture. After many attempts with autovectorization, trying to get better performances, I had to use Intel Intrinsics. Until now, the matrix size was given…
2
votes
1 answer

Intrisic store - bad performance

I want to write benchmark for Xeon Phi (60 core). In my program i use the OpenMP standard and Intel intrinsics. I implemented parallel version of algorithm (5-point stencil computation) which is faster under 230 times than scalar algorithm. I want…
JudgeDeath
  • 151
  • 1
  • 2
  • 9
2
votes
1 answer

Using GCC on Xeon Phi

I was told one can run a program on MIC that was built with gcc. Is that true? If yes, how to proceed? I'm using gcc version 4.4.7.
Éric
  • 419
  • 5
  • 17
1
vote
0 answers

Compiler produce slower program although I gave information

In my knowledge, giving information(like using restrict, static on function, __builtin_expect(), etc) to compiler makes program better or equal. However, this works opposite to what was expected. This is a function that changes the order of data…
enochjung
  • 45
  • 6
1
vote
0 answers

mkl_sparse_d_mv is between +25% to -50% performant than -O3 intel auto-vectorisation on Xeon Phi

Using Intel MKL's mkl_sparse_d_mv function on our physcs solver to perform a sparse matrix-vector multiplication yields a speedup of between -50% and +25% depending on the sparse matrix used on each case, comparing against the auto-vectorisation…
Gaston
  • 537
  • 4
  • 10
1
vote
2 answers

Can I compile Go programs on Xeon Phi (Knight's Landing) processors?

I'm a hobbyist who likes to run my own programs in Go, and as Xeon Phi processors become older they're also becoming extremely cheap. So cheap I can build a dual socket machine from 2015/16 for <$1000 I'm trying to find out if I can run Go programs…
haxonek
  • 174
  • 1
  • 2
  • 17