I'm trying to apply a function to multiple groups/id's in r using the foreach
package. It's taking forever to run using parallel processing via %dopar%
, so I was wondering if it's possible to run the apply
or for loop portion in c++
via rcpp
or other packages to make it faster. I'm not familiar with c++
or other packages that can do this so I'm hoping to learn if this is possible. The sample code is below. My actual function is longer with over 20 inputs and takes even longer to run than what I'm posting
I appreciate the help.
EDIT:
I realized my initial question was vague so I'll try to do a better job. I have a table with time series data by group. Each group has > 10K rows. I have written a function in c++
via rcpp
that filters the table by group and applies a function. I would like to loop through the unique groups and combine the results like rbind
does using rcpp
so that it runs faster. See sample code below (my actual function is longer)
library(data.table)
library(inline)
library(Rcpp)
library(stringi)
library(Runuran)
# Fake data
DT <- data.table(Group = rep(do.call(paste0, Map(stri_rand_strings, n=10, length=c(5, 4, 1),
pattern = c('[A-Z]', '[0-9]', '[A-Z]'))), 180))
df <- DT[order(Group)][
, .(Month = seq(1, 180, 1),
Col1 = urnorm(180, mean = 500, sd = 1, lb = 5, ub = 1000),
Col2 = urnorm(180, mean = 1000, sd = 1, lb = 5, ub = 1000),
Col3 = urnorm(180, mean = 300, sd = 1, lb = 5, ub = 1000)),
by = Group
]
# Rcpp function
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
DataFrame testFunc(DataFrame df, StringVector ids, double var1, double var2) {
// Filter by group
using namespace std;
StringVector sub = df["Group"];
std::string level = Rcpp::as<std::string>(ids[0]);
Rcpp::LogicalVector ind(sub.size());
for (int i = 0; i < sub.size(); i++){
ind[i] = (sub[i] == level);
}
// Access the columns
CharacterVector Group = df["Group"];
DoubleVector Month = df["Month"];
DoubleVector Col1 = df["Col1"];
DoubleVector Col2 = df["Col2"];
DoubleVector Col3 = df["Col3"];
// Create calculations
DoubleVector Cola = Col1 * (var1 * var2);
DoubleVector Colb = Col2 * (var1 * var2);
DoubleVector Colc = Col3 * (var1 * var2);
DoubleVector Cold = (Cola + Colb + Colc);
// Result summary
std::string Group_ID = level;
double SumCol1 = sum(Col1);
double SumCol2 = sum(Col2);
double SumCol3 = sum(Col3);
double SumColAll = sum(Cold);
// return a new data frame
return DataFrame::create(_["Group_ID"]= Group_ID, _["SumCol1"]= SumCol1,
_["SumCol2"]= SumCol2, _["SumCol3"]= SumCol3, _["SumColAll"]= SumColAll);
}
# Test function
Rcpp::sourceCpp('sample.cpp')
testFunc(df, ids = "BFTHU1315C", var1 = 24, var2 = 76) # ideally I would like to loop through all groups (unique(df$Group))
# Group_ID SumCol1 SumCol2 SumCol3 SumColAll
# 1 BFTHU1315C 899994.6 1798561 540001.6 5907129174
Thanks in advance.