2

I am writing a cpp function to replace any NA values with the next non-na value. Code works properly regarding the replacement, however I want to return back the NA values for those that don't have a later non-NA value.

Eg:

fill_backward(c(1, NA, 2)) --> 1, 2, 2

fill_backward(c(1, NA, 2, NA)) --> 1, 2, 2, NA

#include <Rcpp.h>
using namespace Rcpp;
//' given NA values fill them with the next non-na value
//' @param x A numeric vector of values
//' @details
//' Works very well in context of dplyr to carry out last-observation-carried-foward
//' for different individuals. It will NOT replace leading NA's
//' @examples /dontrun {
//' fill_forward(c(1.0, NA, 2))
//' fill_forward(c(NA, 1, NA, 2))
//' library(dplyr)
//' df <- data_frame(id = c(1, 1, 2, 2), obs = c(1.2, 4.8, 2.5, NA))
//' df %>% group_by(id) %>% mutate(obs_locf = fill_forward(obs))
//' }
//' @export
// [[Rcpp::export]]
NumericVector fill_backward(NumericVector x) {
  int n = x.size();
  NumericVector out = no_init(n);
  for (int i = 0; i < n; ++i) {
    if (R_IsNA(x[i])) {
      for (int j = i+1; j < n; ++j) {
       if(R_IsNA(x[j])) {
         continue;
       } else {
         out[i] = x[j];
         break;
       } 
       //if never gets to another actual value
       out[i] = NumericVector::get_na();
      }
    } else { //not NA
      out[i] = x[i];
    }
  }
  return out;
}

Currently fill_backward(c(NA, 1.0, NA, 2, NA, NA)) returns:

[1] 1.000000e+00 1.000000e+00 2.000000e+00 [4] 2.000000e+00 2.156480e-314 -1.060998e-314

instead of 1 1 2 2 NA NA

For returning the NA value back it is out[i] = NumericVector::get_na();

I have also tried out[i] = REAL_NA and out[i] = x[i]` and nothing seems to work.

Finally, I used the same type of implementation for a fill_forward implementation, which can be seen here where leading NA's should return as NA - and it properly returns NA values so I am at a complete loss.

EDIT: Fixed thanks to @Roland 's suggestions

dpastoor
  • 146
  • 8
  • 1
    You are aware of `library(zoo); help("na.locf")`? – Roland Apr 26 '15 at 15:56
  • http://stackoverflow.com/questions/24004065/na-locf-and-inverse-rle-in-rcpp – Khashaa Apr 26 '15 at 15:58
  • Yes thanks, I've been aware of those implementations. This is just a base version of a more complex functionality that I want/need. @Roland actually answered it below in that it was a bug due to me using the continue statement not as I expected. – dpastoor Apr 26 '15 at 18:04

2 Answers2

3

You can initialize out with NA values:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector fill_backward(NumericVector x) {
  int n = x.size();
  NumericVector out = NumericVector(n, NumericVector::get_na());
  for (int i = 0; i < n; ++i) {
    if (R_IsNA(x[i])) {
      for (int j = i+1; j < n; ++j) {
       if(R_IsNA(x[j])) {
         continue;
       } else {
         out[i] = x[j];
         break;
       } 
             }
    } else { //not NA
      out[i] = x[i];
    }
  }
  return out;
}

Testing it:

fill_backward(c(NA, 1.0, NA, 2, NA, NA))
[1]  1  1  2  2 NA NA

And I should probably mention that your line out[i] = NumericVector::get_na(); is never reached due to your use of continue.

Roland
  • 127,288
  • 10
  • 191
  • 288
2

The zoo package already does this, and does it well and fast:

R> suppressMessages(library(zoo))
R> zoo::na.locf(vec, fromLast=TRUE, na.rm=FALSE)
[1]  1  1  2  2 NA NA
R> 

and the RcppXts package allows you access zoo and xts code in your C++ code should you so desire including na.locf functionality via this access point

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Thanks, I've been aware of the zoo package. Partially this is to keep me from needing to add an entire library as a dependency just for one function, partially as a learning experience, and finally I want to add in the future additional functionality that the na.locf function doesn't have. Thanks for the tip regarding accessing the code though, I know that will definitely come in handy in the future! – dpastoor Apr 26 '15 at 18:00