I tried to run a for loop 1,000,000,000 times on Xeon E5 and Xeon Phi, and measurement time to compare their efficacy, I'm so surprise I got the following result:
- On E5 (1 Thread): 41.563 Sec
- On E5 (24 Threads): 22.788 Sec
- Offload on Xeon Phi (240 Threads): 45.649 Sec
Can anybody tell me that why I get the bad efficacy? About architecture or any another?
Why I got the bad efficeny on Xeon Phi? I do nothing on the for loop. If my Xeon Phi coprocessor didn't had any problem, what work for Xeon Phi is great? Must be vectorization? if not vectorization, can I do any thing on Xeon Phi use its threads to help me something?