1

I'm trying to decide if it makes sense to implement R's %*% operator in RCpp if my dataset is huge. BUT, I am really having trouble getting a RCpp implementation.

Here is my example R code

# remove everything in the global environment
rm(list = ls())                

n_states = 4                   # number of states
v_n <- c("H", "S1", "S2", "D") # the 4 states of the model:
n_t = 100                      # number of transitions

# create transition matrix with random numbers. This transition matrix is constant.
m_P = matrix(runif(n_states*n_t), # insert n_states * n_t random numbers (can change this later)
         nrow = n_states, 
         ncol = n_states,
         dimnames = list(v_n, v_n))

# create markov trace, what proportion of population in each state at each period (won't make sense due to random numbers but that is fine)
m_TR <- matrix(NA, 
               nrow = n_t + 1 , 
               ncol = n_states, 
               dimnames = list(0:n_t, v_n))     # create Markov trace (n_t + 1 because R doesn't understand  Cycle 0)

# initialize Markov trace 
m_TR[1, ] <- c(1, 0, 0, 0)                      


# run the loop

microbenchmark::microbenchmark(   # function from microbenchmark library used to calculate how long this takes to run

for (t in 1:n_t){                              # throughout the number of cycles
  m_TR[t + 1, ] <- m_TR[t, ] %*% m_P           # estimate the Markov trace for cycle the next cycle (t + 1)
}


) # end of micro-benchmark function

print(m_TR)   # print the result.

And, here is the replacement for the %*% operator: (WHich doesn't seem to work correctly at all, although I can't fgure out why.

    library(Rcpp)


cppFunction(
    'void estimate_markov(int n_t, NumericMatrix m_P, NumericMatrix m_TR )
     {
            // We want to reproduce this
            //   matrix_A[X+1,]  <- matrix_A[X,] %*% matrix_B
            // The %*% operation behaves as follows for a vector_V %*% matrix_M

            // Here the Matrix M is populated with letters so that you can
            // more easily see how the operation is performed
            // So, a multiplication like this:
            //
            //  V          M
            // {1}  %*%  {A D}
            // {2}       {B E}
            // {3}       {C F}
            //
            // Results in a vector:
            //   V_result
            // {1*A + 1*D}
            // {2*B + 2*E}
            // {3*C + 3*F}
            //
            // Now use values instead of letters for M (ex: A=1, B=2, .., F=6)
            //  V_result
            // {1*1 + 1*4}    {1 + 4}     {5}
            // {2*2 + 2*5} => {4 + 10} => {14}
            // {3*3 + 3*6}    {9 + 18}    {27}
            //
            // Note that the RHS matrix may contain any number of columns,
            // but *MUST* must contain the same number of rows as LHS vector

        // Get dimensions of matricies , and sanity check
        // number of elements in a vector from the LHS matrix must equal == number of row on the RHS
        if( m_TR.cols() != m_P.rows())
            throw std::range_error("Matrix mismatch, m_P.rows != m_TR.rows");

        // we want to know these dimensions, and there is no reason to call these functons in a loop
        // store the values once
        int cnt_P_cols = m_P.cols();
        int cnt_TR_cols = m_TR.cols();

        //
        for(int Index = 1; Index <= n_t; ++Index)
        {
            // iterate over the columns in m_TR
            for(int col_iter = 0; col_iter < cnt_TR_cols; ++col_iter)
            {
                // an accumulator for the vector multiplication
                double sum = 0;

                // The new value comes from the previous row (Index-1)
                double orig_TR = m_TR(col_iter, Index-1);

                // iterate over the columns in m_P corresponding to this Index
                for(int p_iter = 0; p_iter < cnt_P_cols; ++p_iter)
                {
                    // accumulate the value of this TR scalar * the m_P vector
                    sum += orig_TR * m_P(p_iter, Index);
                }
                m_TR(col_iter, Index) = sum;
            }
          }
        }'
    )

Can someone point me to where my logic is going wrong.

Mohsen Alyafei
  • 4,765
  • 3
  • 30
  • 42
SmittyBoy
  • 289
  • 3
  • 12
  • 3
    Check out RcppEigen or RcppArmadillo. You're trying to reinvent the wheel. – thc Oct 20 '20 at 16:51
  • Thanks for the reply. I have looked at RcppArmadillo, but, ideally I would like to be able to perform such funcitons myself. Because eventually, I will have significantly more challenging things to do with Rcpp. – SmittyBoy Oct 21 '20 at 07:52
  • 3
    You should not dismiss existing tools so easily. Actual experts, such as Dr Sanderson for Armadillo, have put a decade into making these things farily excellent. They dispatch to LAPACK/BLAS libraries for the actual matrix multiplications. Now you are up against decade of expertise in numerical mathematics. Now, it is always possible to improve things further but ... it also gets harder and harder. – Dirk Eddelbuettel Oct 22 '20 at 21:37

0 Answers0