I'm using Onnxruntime in NodeJS to execute onnx
converted models in cpu
backend to run inference.
According to the docs, the optional parameters are the following:
var options = {
/**
*
*/
executionProviders: ['cpu'],
/*
* The optimization level.
* 'disabled'|'basic'|'extended'|'all'
*/
graphOptimizationLevel: 'all',
/**
* The intra OP threads number.
* change the number of threads used in the threadpool for Intra Operator Execution for CPU operators
*/
intraOpNumThreads: 1,
/**
* The inter OP threads number.
* Controls the number of threads used to parallelize the execution of the graph (across nodes).
*/
interOpNumThreads: 1,
/**
* Whether enable CPU memory arena.
*/
enableCpuMemArena: false,
/**
* Whether enable memory pattern.
*
*/
enableMemPattern: false,
/**
* Execution mode.
* 'sequential'|'parallel'
*/
executionMode: 'sequential',
/**
* Log severity level
* @see ONNX.Severity
* 0|1|2|3|4
*/
logSeverityLevel: ONNX.Severity.kERROR,
/**
* Log verbosity level.
*
*/
logVerbosityLevel: ONNX.Severity.kERROR,
};
Specifically, I can control (like in Tensorflow) the threading parameters intraOpNumThreads
and interOpNumThreads
, that are defined as above.
I want to optimize both of them for the sequential
and parallel
execution modes (controlled by executionMode
parameter defined above).
My approach was like
var numCPUs = require('os').cpus().length;
options.intraOpNumThreads = numCPUs;
in order to have at least a number of threads like the number of available cpus, hence on my macbook pro I get this session configuration for sequential
execution mode:
{
executionProviders: [ 'cpu' ],
graphOptimizationLevel: 'all',
intraOpNumThreads: 8,
interOpNumThreads: 1,
enableCpuMemArena: false,
enableMemPattern: false,
executionMode: 'sequential',
logSeverityLevel: 3,
logVerbosityLevel: 3
}
and for parallel
execution mode I set both:
{
executionProviders: [ 'cpu' ],
graphOptimizationLevel: 'all',
intraOpNumThreads: 8,
interOpNumThreads: 8,
enableCpuMemArena: false,
enableMemPattern: false,
executionMode: 'parallel',
logSeverityLevel: 3,
logVerbosityLevel: 3
}
or another approach could be to consider a percentage of the available cpus:
var perc = (val, tot) => Math.round( tot*val/100 );
var numCPUs = require('os').cpus().length;
if(options.executionMode=='parallel') { // parallel
options.interOpNumThreads = perc(50,numCPUs);
options.intraOpNumThreads = perc(10,numCPUs);
} else { // sequential
options.interOpNumThreads = perc(100,numCPUs);
options.intraOpNumThreads = 1;
}
but I do not find any doc to ensure this is the optimal configuration for those two scenaries based on the executionMode ('sequential' and 'parallel' execution modes). Is theoretically correct this approach?