0

I want to have 2 evaluates running parallelly on 2 different devices(and 2 different sessions) that I have created, for which I am using EvaluateAsync()

Code:

std::cout<<"            Starting Evaluate           " << std::endl;

auto start = high_resolution_clock::now();

auto eval_1= session.EvaluateAsync(binding, L""();
auto eval_2 = session_2.EvaluateAsync( binding_2, L"" )

auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>( stop - start );

std::cout << "          Ending Evaluate         " << duration.count() << std::endl; 

Expected behavior:

With only one evaluate call (let's assume only auto eval_1= session.EvaluateAsync(binding, L""(); between time recorded), I know duration is 10 ms.

If the EvaluateAsyn is truly asynchronous I expect with 2 calls, the time should be max of the 2 calls, however, it takes double the time ie 20 ms to execute.

1 Answers1

0

Looks like this question was also asked in the Windows Machine Learning GitHub repo and there are answers there. In summary, asynchronous evaluation is complicated because on the CPU certain operators will still try to use all the threads, which leads to contention, and on GPU the work of queuing the GPU work is still synchronous and and operators that need to run on the CPU will cause the pipeline to stall while it waits for the GPU work.