I want to manipulate a QVector
using the QtConcurrent::map
function. All my sample program does is to increment all values in a QVector
by 1.
QVector<double> arr(10000000, 0);
QElapsedTimer timer;
qDebug() << QThreadPool::globalInstance()->maxThreadCount() << "Threads";
int end;
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
timer.start();
for(int i = 0; i < 100; ++i) {
std::transform(arr.begin(), arr.end(), arr.begin(), [](double x){ return ++x; });
}
end = timer.elapsed();
qDebug() << end;
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
timer.start();
for(int i = 0; i < 100; ++i) {
std::for_each(arr.begin(), arr.end(), [](double &x){ ++x; });
}
end = timer.elapsed();
qDebug() << end;
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
timer.start();
for(int i = 0; i < 100; ++i) {
QFuture<void> qf = QtConcurrent::map(arr.begin(), arr.end(), [](double &x){ ++x; });
qf.waitForFinished();
}
end = timer.elapsed();
qDebug() << end;
However the program outputs
4 Threads
905 // std::transform
886 // std::for_each
876 // QtConcurrent::map
so there is almost no speed benefit with the multithreaded version. I verified that there are actually 4 threads running. I used -O2 optimization. Is the more common QThreadPool
approach better suited for this situation?
EDIT:
I tried a differernt method using QtConcurrent::run()
. Here are the relevant parts of the program code:
void add1(QVector<double>::iterator first, QVector<double>::iterator last) {
for(; first != last; ++first) {
*first += 1;
}
}
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
std::for_each(arr.begin(), arr.end(), [](double &x){ ++x; });
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
QFuture<void> qf[numThreads];
for(int j = 0; j < numThreads; ++j) {
qf[j] = QtConcurrent::run(add1, arr.begin()+j*n/numThreads, arr.begin()+(j+1)*n/numThreads-1);
}
for(int j = 0; j < numThreads; ++j) {
qf[j].waitForFinished();
}
So I manually distribute the task over different threads. But still I hardly get a performance boost:
181 ms // std::for_each
163 ms // QtConcurrent::run
What's still wrong here?