Rewriting R's cummin() function using Rcpp and allowing for NAs

Question

I'm learning Rcpp. In this example, I'm attempting to roll my own cummin() function like base R's cummin(), except I'd like my version to have a na.rm argument. This is my attempt

cummin.cpp

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector cummin_cpp(NumericVector x, bool narm = false){
  // Given a numeric vector x, returns a vector of the 
  // same length representing the cumulative minimum value
  // if narm = true, NAs will be ignored (The result may 
  // contain NAs if the first values of x are NA.)
  // if narm = false, the resulting vector will return the 
  // cumulative min until the 1st NA value is encountered
  // at which point all subsequent entries will be NA

  if(narm){
    // Ignore NAs
    for(int i = 1; i < x.size(); i++){
      if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1];
    }
  } else{
    // Don't ignore NAs
    for(int i = 1; i < x.size(); i++){
      if(NumericVector::is_na(x[i-1]) | NumericVector::is_na(x[i])){
        x[i] = NA_REAL;
      } else if(x[i-1] < x[i]){
        x[i] = x[i-1];
      }
    }
  }

  return x;
}

foo.R

library(Rcpp)
sourceCpp("cummin.cpp")

x <- c(3L, 1L, 2L)
cummin(x)  # 3 1 1
cummin_cpp(x)  # 3 1 1

class(cummin(x))  # integer
class(cummin_cpp(x))  # numeric

I have a few questions..

R's standard variable name is na.rm, not narm as I've done. However, it seems I can't use a dot in the c++ variable name. Is there a way around this so I can be consistent with R's convention?
I don't know ahead of time if the user's input is going to be a numeric vector or an integer vector, so I've used Rcpp's NumericVector type. Unfortunately, if the input is integer, the output is cast to numeric unlike base R's cummin() behavior. How do people usually deal with this issue?
The line if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1]; seems silly, but I don't know a better way to do this. Suggestions here?

For (1) typically I think you'd write an R function with the "normal" arguments as a wrapper, and then pass them immediately to your Rcpp function. — joran, Aug 24 '18 at 22:37
For (2) you will have to role out your own templated approach. Something like this [Initialize a variable with different type based on a switch statement](https://stackoverflow.com/questions/24622918/initialize-a-variable-with-different-type-based-on-a-switch-statement). — Joseph Wood, Aug 25 '18 at 02:34
Here is a link with more information to the Rcpp gallery : [Dynamic Wrapping and Recursion with Rcpp](http://gallery.rcpp.org/articles/rcpp-wrap-and-recurse/) — Joseph Wood, Aug 25 '18 at 02:37
Here is a good one for dealing with `NAs` [Creating a Templated Function to Fill a Vector with another depending on Size](https://stackoverflow.com/a/43395996/4408538) — Joseph Wood, Aug 25 '18 at 03:21
`na.action` is handled by Base R before passing to subroutines. You need a wrapper for this C routine. — AdamO, Aug 02 '23 at 17:51
`na.rm` is not well defined for a cumulative minimum. If the input vector is `c(3,NA,1)`, it's not clear if the output should be `c(3,1)` or `c(3, 3, 1)`, or even `c(3, NA, 1)`. — AdamO, Aug 02 '23 at 17:52

score 5 · Accepted Answer · answered Aug 25 '18 at 07:05

I would use this:

template<typename T, int RTYPE>
Vector<RTYPE> cummin_cpp2(Vector<RTYPE> x, bool narm){

  Vector<RTYPE> res = clone(x);
  int i = 1, n = res.size();
  T na;

  if(narm){
    // Ignore NAs
    for(; i < n; i++){
      if(ISNAN(res[i]) || (res[i-1] < res[i])) res[i] = res[i-1];
    }
  } else{
    // Do not ignore NAs
    for(; i < n; i++){
      if(ISNAN(res[i-1])) {
        na = res[i-1];
        break;
      } else if(res[i-1] < res[i]){
        res[i] = res[i-1];
      }
    }
    for(; i < n; i++){
      res[i] = na;
    }
  }

  return res;
}


// [[Rcpp::export]]
SEXP cummin_cpp2(SEXP x, bool narm = false) {
  switch (TYPEOF(x)) {
  case INTSXP:  return cummin_cpp2<int, INTSXP>(x, narm);
  case REALSXP: return cummin_cpp2<double, REALSXP>(x, narm);
  default: Rcpp::stop("SEXP Type Not Supported."); 
  }
}

Try this on:

x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)

cummin_cpp(x, narm = TRUE)
x

cummin_cpp(x2)
x2


x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
x3 <- replace(x, is.na(x), NaN)

cummin_cpp2(x, narm = TRUE)
x

cummin_cpp2(x2)
x2

cummin_cpp2(x3)
x3

Explanation:

Joran's advice is good, just wrap that in an R function
I use a dispatcher as Joseph Wood suggested
Beware that x is passed by reference and is modified if of the same type of what you declared (see these 2 slides)
You need to handle NA as well as NaN
You can use || instead of | to evaluate only the first condition if it is true.

Rewriting R's cummin() function using Rcpp and allowing for NAs

1 Answers1