I have written a function in both R and Rcpp, which basically just takes in a data-set x, a scale parameter gamma and a vector parameter beta, and hence returns the fitted probabilities. Below is the my code in R:
tp_predict<-function(x,gamma,beta){
n<-nrows(x)
x<-as.matrix(cbind(1,x))
vec<-x%*%beta
In<-vec<0
Ip<-!vec<0
vecN<-vec[In]
vecP<-vec[Ip]
fitted<-vector(,n)
t<-tanh(gamma)
b<-t+1
a<-1-t
fitted[In]<-b/(exp(-vecN/b)+1)
fitted[Ip]<-a/(exp(-vecP/a)+1)+t
fitted<-as.vector(fitted)
return(fitted)
}
In Rcpp:
arma::vec tp_predict_c(arma::mat x,double gamma,arma::vec beta){
int n = x.n_rows;
arma::vec one(n, fill::ones);
arma::mat xx=join_rows(one,x);
arma::vec v = xx * beta;
arma::uvec In = find (v<0);
arma::uvec Ip = find (v>=0);
arma::vec vecN = v.elem(In);
arma::vec vecP = v.elem(Ip);
double t = tanh(gamma);
double a = 1-t;
double b = 1+t;
arma::vec fitted(n);
fitted.elem(In)=b/(exp(-vecN/b)+1);
fitted.elem(Ip)=a/(exp(-vecP/a)+1)+t;
return fitted;
}
The tested data x was generated from a multi-normal distribution using rmvnorm(). In low dimensions (10 columes) and small data sizes(1000 rows), the Rcpp code works better than R code
Unit: microseconds
expr min lq mean median uq max neval
tp_predict(x = X, 0.3, Beta) 62.1 63.85 173.640 65.5 66.85 10868.9 100
tp_predict_c(x = X, 0.3, Beta) 30.7 31.15 40.714 31.9 32.75 781.6 100
but if I increased the size to 100000 (dimension is still 10), the Rcpp code gets slower than R:
Unit: milliseconds
expr min lq mean median uq max neval
tp_predict(x = X, 0.3, Beta) 5.8967 6.15060 8.477195 6.21560 6.7766 131.9571 100
tp_predict_c(x = X, 0.3, Beta) 8.6547 8.82975 9.102348 9.01935 9.2005 12.6010 100
Also if I increase the dimension to 100 and size remains 1000, Rcpp function does not act as good as in low dimensions:
Unit: microseconds
expr min lq mean median uq max neval
tp_predict(x = X, 0.3, Beta) 188.7 197.0 204.745 201.85 206.5 280.5 100
tp_predict_c(x = X, 0.3, Beta) 166.1 171.1 189.729 181.20 189.9 511.0 100
It looks like the cpu time of Rcpp function scales more rapidly than R when increasing in dimensions and sizes. I am new to Rcpp and I just directly translated my R code to Rcpp. I am not sure if I did something silly and I have no idea why this is happening. Is there anything I can do to address this?