I am doing trials with device fission used on FX8150 7-cores device and 1 core for hosting. Then put some workload of array of 51200 floats. Calculation is done on O(N*N)(trigonometric) basis. When run first time, it uses only 3 cores then second run uses 7 cores and last run uses 3 cores again. Could this be a random occupation issue? Because sometimes hosting thread can get in way and change the time it completes. This is done in jocl.
Sometimes even the first run uses 7 cores and later starts using only 5 cores. Looks like random. Even trying even number of cores like 4-6 makes same behaviour. Maybe it is the windows-7 64-bit module utilisation thing, the sharing of resources? Needs at least 50-200 runs to be completely stable in terms of number of cores used. Maybe the just in time compiler and hot-spot kicked in? Thanks.