I have a function with a loop (EstimateUniques) that is parallelized with OpenMP. I suggested that multithreading should be more efficient than multiprocessing, but when I compare this function with the simple run of "mclapply", it showed lower performance. What is the proper way to achieve the same level of parallelization in c++ as in R? Am I doing something wrong?
Performance comparison (time in seconds):
#Cores CPP R
1 1.721s 1.538s
2 1.945s 1.080s
3 2.858s 0.801s
R code:
Rcpp::sourceCpp('ReproducibleExample.cpp')
arr <- 1:10000
n_rep <- 150
n_iters <- 200
EstimateUniquesR <- function(arr, n_iters, n_rep, cores) {
parallel::mclapply(1:n_iters, function(i)
GetNumberOfUniqSamples(arr, i * 10, n_rep), mc.cores=cores)
}
cpp_times <- sapply(1:3, function(threads)
system.time(EstimateUniques(arr, n_iters, n_rep, threads))['elapsed'])
r_times <- sapply(1:3, function(cores)
system.time(EstimateUniquesR(arr, n_iters, n_rep, cores))['elapsed'])
data.frame(CPP=cpp_times, R=r_times)
Example.cpp file:
// [[Rcpp::plugins(openmp)]]
// [[Rcpp::plugins(cpp11)]]
#include <algorithm>
#include <vector>
#include <omp.h>
// [[Rcpp::export]]
int GetNumberOfUniqSamples(const std::vector<int> &bs_array, int size, unsigned n_rep) {
unsigned long sum = 0;
for (unsigned i = 0; i < n_rep; ++i) {
std::vector<int> uniq_vals(size);
for (int try_num = 0; try_num < size; ++try_num) {
uniq_vals[try_num] = bs_array[rand() % bs_array.size()];
}
std::sort(uniq_vals.begin(), uniq_vals.end());
sum += std::distance(uniq_vals.begin(), std::unique(uniq_vals.begin(), uniq_vals.end()));
}
return std::round(double(sum) / n_rep);
}
// [[Rcpp::export]]
std::vector<int> EstimateUniques(const std::vector<int> &bs_array, const int n_iters,
const int n_rep = 1000, const int threads=1) {
std::vector<int> uniq_counts(n_iters);
#pragma omp parallel for num_threads(threads) schedule(dynamic)
for (int i = 0; i < n_iters; ++i) {
uniq_counts[i] = GetNumberOfUniqSamples(bs_array, (i + 1) * 10, n_rep);
}
return uniq_counts;
}
I tried to use other types of scheduling in OpenMP, but they gave even worse results.